Question

Fastest algorithm for filling overlapping rectangles of pixels

I have an image. The pixel value is either 0 or 1. All pixels are initially zero.

Given a set of overlapping rectangles characterized by (bottom left pixel coordinate, width, height), I need to flip every pixel inside a rectangle. The following figure shows a toy example:

enter image description here

I have tens of millions of such images to fill. Each image has tens of thousands of pixels. The overlapping rectangles associated to each image are quite many.

The naïve approach is to just iterate through the set of rectangles for every image. But in our case, because the number of rectangles and the overlapping areas are all huge, the computational wastes are substantial.

I intend to code an algorithm by first finding the union sets for each column, like the following:

enter image description here

But before committing to it, is there a well established algorithm for solving this problem? It seems quite fundamental in computer vision / image processing.

Update, more context: the problem is related to nearest neighbor search across a large set of simulated hurricane events. A hurricane is a spatial series originally characterized by a sequence of pixels on an image. For each pixel, we engineer more features by coloring pixels in the same neighborhood. The neighborhood is usually a rectangle or a square. The feature engineering is run multiple times with increasing rectangle size, so each hurricane ends up with multiple images. These images are then flattened and concatenated to a bit array as the final feature vector for the hurricane. Jaccard distance is then used for searching the nearest neighbor. By profiling the code the bottleneck is the feature engineering, which prompted me to think about better way of filling those bit vectors.

 3  88  3
1 Jan 1970

Solution

 3

Using with a very naive form of Sweep line algorithm :

rects = [box(x, y, x + w, y + h) for (x, y), w, h in R]

uu = unary_union(rects)
_, ymin, _, ymax = uu.bounds
xs = sorted({x for x, y in uu.exterior.coords})
gcollections = [split(uu, LineString([(x, ymin), (x, ymax)])) for x in xs[1:-1]]

hsegments = []
for i, (lg, rg) in enumerate(pairwise(gcollections)):
    lp1, rp1, lp2, rp2 = chain.from_iterable(
        map(lambda geom: sorted(
            geom, key=lambda p: p.bounds[0]
        ), [lg.geoms, rg.geoms])
    )
    hseg = rp1.intersection(lp2)
    if i == 0:
        hsegments.append(lp1)
    hsegments.append(hseg)
    if i == (len(gcollections) - 2):
        hsegments.append(rp2)

enter image description here

Setup :

from itertools import chain, pairwise

import matplotlib.pyplot as plt
from shapely import LineString, MultiLineString, Polygon, box, unary_union
from shapely.ops import split
from shapely.plotting import plot_polygon

# set of rectangles (op)
R = {
    ((60, 100), 89, 256),
    ((210, 58), 180, 212),
    ((120, 229), 209, 256),
    ((1, 143), 449, 171),
}
2024-07-02
Timeless

Solution

 1

Filling a rectangle is a relatively cheap operation, and trivial to implement (with any library designed to handle images or 2D arrays, you can have a working implementation in minutes, literally). If this turns out too slow for your application (don't make assumptions, try it out first), you could implement something much more complex that touches each pixel only once as follows:

  1. Compute the intersection of the rectangles. This is a polygon. This is a non-trivial algorithm to implement correctly, I would use an existing implementation. For example, in C++ you have Boost Geometry or Clipper. The latter is also available in Delphi and C#. I really like Clipper.

  2. Draw the polygon using the polygon filling algorithm. This also is a non-trivial algorithm to get right, use an existing implementation if possible. I found this explanation of the algorithm to be very helpful.


In a comment you said that this was "relevant to nearest neighbor search among tens of millions of simulated hurricane footprints". Nearest neighbor search is best done using an R*-tree. Again, this is not a trivial algorithm, you should use an existing implementation. Boost Geometry, linked above, implements R*-trees.

2024-07-02
Cris Luengo