Approximate Nearest Neighbor Search (ANNS) underpins many large-scale data mining and machine learning applications, with efficient retrieval increasingly hinging on GPU acceleration as dataset sizes grow. Although graph-based approaches represent the state of the art in approximate nearest neighbor search, there is a lack of systematic understanding regarding their optimization for modern GPU architectures and their end-to-end effectiveness in practical scenarios. In this work, we present a comprehensive survey and experimental study of GPU-accelerated graph-based vector search algorithms. We establish a detailed taxonomy of GPU optimization strategies and clarify the mapping between algorithmic tasks and hardware execution units within GPUs. Through a thorough evaluation of six leading algorithms on eight large-scale benchmark datasets, we assess both graph index construction and query search performance. Our analysis reveals that distance computation remains the primary computational bottleneck, while data transfer between the host CPU and GPU emerges as the dominant factor influencing real-world latency at large scale. We also highlight key trade-offs in scalability and memory usage across different system designs. Our findings offer clear guidelines for designing scalable and robust GPU-powered approximate nearest neighbor search systems, and provide a comprehensive benchmark for the knowledge discovery and data mining community.
翻译:近似最近邻搜索(ANNS)是众多大规模数据挖掘与机器学习应用的基础支撑技术。随着数据集规模持续增长,高效检索日益依赖于GPU加速。尽管基于图的方法代表了近似最近邻搜索领域的先进水平,但学术界对其在现代GPU架构上的优化机制及其在实际场景中的端到端效能仍缺乏系统性认知。本研究对GPU加速的图向量搜索算法进行了全面综述与实验分析。我们建立了GPU优化策略的详细分类体系,并阐明了算法任务与GPU硬件执行单元之间的映射关系。通过对八项大规模基准数据集上的六种主流算法进行深入评估,我们系统考察了图索引构建与查询搜索两方面的性能表现。分析表明:距离计算仍是主要计算瓶颈,而主机CPU与GPU间的数据传输成为大规模实际场景中影响实时延迟的主导因素。同时,我们揭示了不同系统设计在可扩展性与内存使用方面的关键权衡。本研究为设计可扩展且鲁棒的GPU驱动近似最近邻搜索系统提供了明确指导,并为知识发现与数据挖掘领域提供了全面的基准参考。