Approximate nearest neighbor (ANN) search is a widely applied technique in modern intelligent applications, such as recommendation systems and vector databases. Therefore, efficient and high-throughput execution of ANN search has become increasingly important. In this paper, we first characterize the state-of-the-art product quantization-based method of ANN search and identify a significant source of inefficiency in the form of unnecessary pairwise distance calculations and accumulations. To improve efficiency, we propose JUNO, an end-to-end ANN search system that adopts a carefully designed sparsity- and locality-aware search algorithm. We also present an efficient hardware mapping that utilizes ray tracing cores in modern GPUs with pipelined execution on tensor cores to execute our sparsity-aware ANN search algorithm. Our evaluations on four datasets ranging in size from 1 to 100 million search points demonstrate 2.2x-8.5x improvements in search throughput. Moreover, our algorithmic enhancements alone achieve a maximal 2.6x improvement on the hardware without the acceleration of the RT core.
翻译:近似最近邻(ANN)搜索是推荐系统、向量数据库等现代智能应用中广泛采用的技术。因此,实现高效高吞吐量的ANN搜索已变得日益重要。本文首先对基于乘积量化的ANN前沿方法进行深入分析,发现其存在因非必要成对距离计算与累加导致的显著低效问题。为提升效率,我们提出JUNO——一个端到端ANN搜索系统,该系统采用精心设计的稀疏性与局部性感知搜索算法。同时,我们提出一种高效硬件映射方案,利用现代GPU中的光线追踪核心,并通过张量核心上的流水线执行来运行所设计的稀疏感知ANN搜索算法。在四个规模从100万到1亿个搜索点的数据集上的评估表明,我们的方法实现了2.2倍至8.5倍的搜索吞吐量提升。此外,单独采用算法优化方案(未使用RT核心加速)即可在硬件上获得最高2.6倍的性能提升。