Approximate Nearest Neighbor Search (ANNS) has become fundamental to modern deep learning applications, having gained particular prominence through its integration into recent generative models that work with increasingly complex datasets and higher vector dimensions. Existing CPU-only solutions, even the most efficient graph-based ones, struggle to meet these growing computational demands, while GPU-only solutions face memory constraints. As a solution, we propose PilotANN, a hybrid CPU-GPU system for graph-based ANNS that utilizes both CPU's abundant RAM and GPU's parallel processing capabilities. Our approach decomposes the graph traversal process of top-$k$ search into three stages: GPU-accelerated subgraph traversal using SVD-reduced vectors, CPU refinement and precise search using complete vectors. Furthermore, we introduce fast entry selection to improve search starting points while maximizing GPU utilization. Experimental results demonstrate that PilotANN achieves $3.9 - 5.4 \times$ speedup in throughput on 100-million scale datasets, and is able to handle datasets up to $12 \times$ larger than the GPU memory. We offer a complete open-source implementation at https://github.com/ytgui/PilotANN.
翻译:近似最近邻搜索已成为现代深度学习应用的基础技术,随着其被集成到处理日益复杂数据集和更高向量维度的最新生成模型中,其重要性尤为凸显。现有的纯CPU解决方案(即使最高效的基于图的方法)难以满足这些增长的计算需求,而纯GPU解决方案则面临内存限制。为此,我们提出PilotANN——一种基于图的混合CPU-GPU近似最近邻搜索系统,该系统同时利用CPU的充足内存和GPU的并行处理能力。我们的方法将top-$k$搜索的图遍历过程分解为三个阶段:使用SVD降维向量的GPU加速子图遍历、使用完整向量的CPU精化与精确搜索。此外,我们引入了快速入口选择机制以优化搜索起始点,同时最大化GPU利用率。实验结果表明,在亿级数据集上PilotANN实现了$3.9 - 5.4 \times$的吞吐量提升,并能处理比GPU内存大$12 \times$的数据集。我们在https://github.com/ytgui/PilotANN提供了完整的开源实现。