Spatial join processing techniques that identify intersections between complex geometries (e.g., polygons) commonly follow a two-step filter-and-refine pipeline. The filter step evaluates the query predicate on the minimum bounding rectangles (MBRs) of the geometries, while the refinement step eliminates false positives by applying the query on the exact geometries. To accelerate spatial join evaluation over complex geometries, we propose a raster intervals approximation of object geometries and introduce a powerful intermediate step in the pipeline. In a preprocessing phase, our method (i) rasterizes each object geometry using a fine grid, (ii) models groups of nearby cells that intersect the polygon as an interval, and (iii) encodes each interval with a bitstring capturing the overlap of each cell in it with the polygon. Going one step further, we improve our approach by approximating each object by two sets of intervals that succinctly capture the raster cells which (i) intersect with the object and (ii) are fully contained within the object. Using this representation, we show that we can verify whether two polygons intersect through a sequence of linear-time joins between the interval sets. Our approximations are effectively compressible and customizable for partitioned data and polygons of varying sizes, rasterized at different granularities. Finally, we propose a novel algorithm that computes the interval approximation of a polygon without fully rasterizing it first, rendering the computation of approximations orders of magnitude faster. Experiments on real data demonstrate the effectiveness and efficiency of our proposal over previous work.
翻译:识别复杂几何图形(如多边形)之间相交关系的空间连接处理技术通常遵循过滤-精炼的两步流水线。过滤步骤在几何图形的最小边界矩形上评估查询谓词,而精炼步骤通过在精确几何图形上应用查询来消除误报。为加速复杂几何图形的空间连接评估,我们提出一种对象几何的栅格区间近似方法,并在流水线中引入一个强大的中间步骤。在预处理阶段,我们的方法(i)使用精细网格对每个对象几何进行栅格化,(ii)将与多边形相交的相邻单元格组建模为区间,并(iii)用比特串对每个区间进行编码,该比特串捕获区间内每个单元格与多边形的重叠情况。进一步地,我们通过用两组区间近似每个对象来改进方法,这两组区间分别精炼地捕获(i)与对象相交及(ii)完全包含在对象内的栅格单元格。利用这种表示方法,我们证明可以通过区间集合之间的线性时间连接序列来验证两个多边形是否相交。我们的近似表示具有高效的可压缩性,并能针对分区数据和不同尺寸的多边形(以不同粒度栅格化)进行定制。最后,我们提出一种新颖算法,无需预先完全栅格化即可计算多边形的区间近似,使近似计算速度提升数个数量级。在真实数据上的实验证明了我们方案相对于先前工作的有效性和高效性。