Neighbor graphs capture relationships among data points and are widely used in data analytics and AI workloads. Many studies have explored approximate construction methods for single-node systems, including GPUs. However, extending this to distributed systems for larger data and further acceleration remains challenging due to irregular computation patterns. We present SOLANET, a GPU-accelerated distributed neighbor graph construction toolkit. SOLANET first constructs local graphs on each GPU after data partitioning and then refines them via approximate nearest neighbor (ANN) searches over remote graphs pulled from other GPUs using MPI one-sided operations. SOLANET also provides a lock-free single-GPU neighbor graph construction algorithm for AMD GPUs. Our single-GPU implementation outperforms a state-of-the-art GPU-based approximate neighbor graph construction implementation across multiple datasets on a single MI300A APU. Furthermore, SOLANET demonstrates 11X speedup from 32 to 512 APUs for 1 billion data points and 6.9x speedup from 64 to 512 APUs for 2 billion points.
翻译:近邻图能够捕捉数据点之间的关系,广泛应用于数据分析和人工智能工作负载。已有大量研究探索了基于单节点(包括GPU)的近似构建方法。然而,由于不规则的计算模式,将这些方法扩展到分布式系统以处理更大数据量并实现进一步加速仍面临挑战。本文提出SOLANET,一个基于GPU加速的分布式近邻图构建工具包。SOLANET首先在每个GPU上完成数据分区后构建局部图,然后利用MPI单边操作从其他GPU拉取远端图,并通过近似最近邻搜索对其进行优化。此外,SOLANET还为AMD GPU提供了一种无锁的单GPU近邻图构建算法。在单MI300A APU上,我们的单GPU实现在多个数据集上的性能均优于现有最先进的GPU近似近邻图构建方案。进一步地,SOLANET在32至512个APU上对10亿数据点实现了11倍加速,在64至512个APU上对20亿数据点实现了6.9倍加速。