Non-point spatial objects (e.g., polygons, linestrings, etc.) are ubiquitous. We study the problem of indexing non-point objects in memory for range queries and spatial intersection joins. We propose a secondary partitioning technique for space-oriented partitioning indices (e.g., grids), which improves their performance significantly, by avoiding the generation and elimination of duplicate results. Our approach is easy to implement and can be used by any space-partitioning index to significantly reduce the cost of range queries and intersection joins. In addition, the secondary partitions can be processed independently, which makes our method appropriate for distributed and parallel indexing. Experiments on real datasets confirm the advantage of our approach against alternative duplicate elimination techniques and data-oriented state-of-the-art spatial indices. We also show that our partitioning technique, paired with optimized partition-to-partition join algorithms, typically reduces the cost of spatial joins by around 50%.
翻译:非点空间对象(如多边形、线串等)普遍存在。本文研究内存中非点对象的索引问题,旨在支持范围查询和空间交集连接。我们提出了一种面向空间划分索引(如网格)的二次划分技术,通过避免重复结果的生成与消除,显著提升其性能。该方法易于实现,可被任意空间划分索引采用,以大幅降低范围查询和交集连接的计算成本。此外,二次划分支持独立处理,因此适用于分布式与并行索引场景。在真实数据集上的实验表明,该方法在替代性重复消除技术及面向数据的最优空间索引中具有优势。同时,结合优化的分区对分区连接算法,我们的划分技术通常可将空间连接的成本降低约50%。