Question

segment tree with range intersection

Recently I came up with a range problem when thinking about a hobby project.

Consider a big binary array indexed from 0 with size A("[0, A)"), I want to support 3 range (the ranges are defined to be "[left, right)" but isn't very important and can be changed on demand) operations:

  1. RangeSet(v, left, right) that sets a range to a value. For example 00000 after RangeSet(1, 1, 3) would be 01100.
  2. RangeIntersectQuery(left, right) which finds the intersection between a range and all the 1s(or 0s) in the array, then merging the result into a list of ranges. For example, RangeIntersectQuery(1, 5) on 011010010 would be a list of ranges ([1, 3), [4, 5)). I'm not sure whether the order of the ranges would matter in the future, but likely that the list can be degraded to an unordered collection.
  3. RangeQuery(left, right) which checks if an interval is full of 1s(or 0s). For example, RangeQuery(3, 5) on 00010110 would be false while RangeQuery(5, 7) would be true.

Ideally, all operations only concerns 1s, but supporting 0s shouldn't create much trouble.

I was thinking that segment tree should be able to solve operation 1 and 3 in O(logA) time and O(k*A) total memory. However, I've trouble coming up with a good way to handle operation 2.

A can be up to 10^9. RangeSet's range length is likely at most 10^3. RangeQuery's length is probably around 10^6-10^7. RangeIntersectQuery's length should be around 10^8-10^9. I don't really want O(A) time on the 3 operations, I want them to be sublinear.

I'm looking for an algorithm/data structure that can solve my problem.

 2  57  2
1 Jan 1970

Solution

 0

Here's a different idea: encode the binary array as a binary function that maps an index (interpreted as an array of binary variables) to a boolean value, represent that function as an ROBDD (from now on I will just write BDD and imply that it would be reduced and ordered ie ROBDD).

RangeSet(1, l, r) becomes: create an BDD of the range-predicate l <= index < r (this always results in a small BDD with O(log A) nodes) and OR it with the BDD that encodes the binary array.

RangeIntersectQuery: again start by creating the range-predicate, but then AND the predicate-BDD with the array-BDD. The result is a BDD that encodes which entries are set within the range. While that's not a list of ranges, it's a fairly flexible way to represent the intersection - you can perform various operations on that BDD efficiently (counting the True entries in time depending on the size of that BDD instead of the number of entries, OR the whole result into another BDD, other things), but also just enumerate all True entries if that's what you want to do.

RangeQuery: one way to do this is by using RangeIntersectQuery and then counting how many True entries there are in the intersection (there is a simple algorithm for that which scales in the number of nodes of the BDD representing the intersection), compare that with the size of the range. As a shortcut, you could invert the range-predicate and OR it with the array-BDD, then you would get the trivial BDD ⊤ iff the queried range was entirely True, no counting required.

For all of these things (and you could do a lot more), the speed depends primarily on the size of BDD (the number of nodes), not directly on A, though of course the number of nodes can only be large if A is large. The BDD that encodes the array would be large if the contents of the array are very chaotic, but if there is some order to it then the BDD will usually not be large, with some exceptions where the order can be described but cannot be captured by the reducing mechanism of a BDD.

2024-07-24
user555045