We propose a bounds-only pruning test for exact Euclidean AkNN joins on partitioned spatial datasets. Data warehouses commonly partition large tables and store row group statistics for them to accelerate searches and joins, rather than maintaining indexes. AkNN joins can benefit from such statistics by constructing bounds and localizing join evaluations to a few partitions before loading them to build spatial indexes. Existing pruning methods are overly conservative for bounds-only spatial data because they do not fully capture its directional semantics, thereby missing opportunities to skip unneeded partitions at the earliest stages of a join. We propose a three-bound proximity test to determine whether all points within a partition have a closer neighbor in one partition than in another, potentially occluded partition. We show that our algorithm is both optimal and efficient.
翻译:本文针对分区空间数据集上的精确欧几里得AkNN连接,提出了一种仅依赖边界的剪枝判定方法。数据仓库通常会对大型表进行分区并存储行组统计信息以加速搜索和连接操作,而非维护索引。AkNN连接可通过构建边界并将连接计算局部化至少数分区来利用此类统计信息,随后再加载这些分区以构建空间索引。现有剪枝方法对仅含边界信息空间数据的处理过于保守,因其未能充分捕捉其方向语义,从而在连接的最初阶段错失了跳过不必要分区的机会。我们提出了一种三边界邻近度判定方法,用于判断分区内所有点是否在某一分区中存在比另一潜在遮挡分区更近的邻接点。我们证明该算法兼具最优性与高效性。