Raster Interval Object Approximations for Spatial Intersection Joins

Spatial join processing techniques that identify intersections between complex geometries (e.g.,polygons) commonly follow a two-step filter-and-refine pipeline; the filter step evaluates the query predicate on the minimum bounding rectangles (MBRs) of objects and the refinement step eliminates false positives by applying the query on the exact geometries. We propose a raster intervals approximation of object geometries and introduce a powerful intermediate step in pipeline. In a preprocessing phase, our method (i) rasterizes each object geometry using a fine grid, (ii) models groups of nearby cells that intersect the polygon as an interval, and (iii) encodes each interval by a bitstring that captures the overlap of each cell in it with the polygon. Going one step further, we improve our approach to approximate each object by two sets of intervals that succintly capture the raster cells which (i) intersect with the object and (ii) are fully contained in the object. Using this representation, we show that we can verify whether two polygons intersect by a sequence of joins between the interval sets that take linear time. Our approximations can effectively be compressed and can be customized for use on partitioned data and polygons of varying sizes, rasterized at different granularities. Finally, we propose a novel algorithm that computes the interval approximation of a polygon without fully rasterizing it first, rendering the computation of approximations orders of magnitude faster. Experiments on real data demonstrate the effectiveness and efficiency of our proposal over previous work.

翻译：空间连接处理技术通常采用两步过滤-精炼流程来识别复杂几何体（如多边形）之间的交集：过滤阶段评估对象最小外接矩形上的查询谓词，精炼阶段通过将查询应用于精确几何体来消除误报。我们提出了一种基于栅格区间的对象几何近似方法，并在流程中引入了一个强大的中间步骤。在预处理阶段，我们的方法：（i）使用精细网格对每个对象几何体进行栅格化，（ii）将与多边形相交的相邻单元格组建模为区间，（iii）通过比特串编码每个区间，捕获其中各单元格与多边形的重叠情况。进一步，我们改进了该方法，用两组区间近似每个对象，这两组区间简洁地捕获了（i）与对象相交的栅格单元格以及（ii）完全包含在对象内的栅格单元格。利用这种表示，我们证明了可以通过区间集合之间的线性时间连接序列来验证两个多边形是否相交。我们的近似可以有效地压缩，并可定制用于分区数据以及不同粒度的栅格化多边形。最后，我们提出了一种新算法，无需完全栅格化多边形即可计算其区间近似，从而使近似计算速度提高数个数量级。真实数据集上的实验证明了我们方法相比先前工作的有效性和高效性。