Computing fixed-radius near-neighbor graphs is an important first step for many data analysis algorithms. Near-neighbor graphs connect points that are close under some metric, endowing point clouds with a combinatorial structure. As computing power and data acquisition methods advance, diverse sources of large scientific datasets would greatly benefit from scalable solutions to this common subroutine for downstream analysis. Prior work on parallel nearest neighbors has made great progress in problems like k-nearest and approximate nearest neighbor search problems, with particular attention on Euclidean spaces. Yet many applications need exact solutions and non-Euclidean metrics. This paper presents a scalable sparsity-aware distributed memory algorithm using cover trees to compute near-neighbor graphs in general metric spaces. We provide a shared-memory algorithm for cover tree construction and demonstrate its competitiveness with state-of-the-art fixed-radius search data structures. We then introduce two distributed-memory algorithms for the near-neighbor graph problem, a simple point-partitioning strategy and a spatial-partitioning strategy, which leverage the cover tree algorithm on each node. Our algorithms exhibit parallel scaling across a variety of real and synthetic datasets for both traditional and non-traditional metrics. On real world high dimensional datasets with one million points, we achieve speedups up to 678.34x over the state-of-the-art using 1024 cores for graphs with 70 neighbors per vertex (on average), and up to 1590.99x using 4096 cores for graphs with 500 neighbors per vertex (on average).
翻译:计算固定半径近邻图是众多数据分析算法的重要预处理步骤。近邻图通过度量空间中邻近点之间的连接,为点云数据赋予组合结构。随着计算能力与数据采集方法的进步,各类大规模科学数据集亟需可扩展的解决方案来支撑下游分析流程。现有并行近邻搜索研究在k近邻与近似近邻搜索等问题上已取得显著进展,尤其关注欧氏空间场景。然而诸多应用场景需要精确解与非欧度量空间。本文提出一种基于覆盖树的可扩展稀疏感知分布式内存算法,用于通用度量空间中的近邻图构建。我们首先提出共享内存的覆盖树构建算法,并证明其与当前最优固定半径搜索数据结构具有竞争力。随后针对近邻图问题提出两种分布式内存算法:基于简单点划分的策略与基于空间划分的策略,两者均在节点层面利用覆盖树算法。我们的算法在传统与非传统度量空间下,针对多种真实与合成数据集均展现出良好的并行扩展性。在百万级高维真实数据集上,对于平均每顶点70邻接边的图结构,使用1024核时相比当前最优方法获得最高678.34倍加速;对于平均每顶点500邻接边的图结构,使用4096核时获得最高1590.99倍加速。