Approximate Nearest Neighbor Search (ANNS) plays a critical role in various disciplines spanning data mining and artificial intelligence, from information retrieval and computer vision to natural language processing and recommender systems. Data volumes have soared in recent years and the computational cost of an exhaustive exact nearest neighbor search is often prohibitive, necessitating the adoption of approximate techniques. The balanced performance and recall of graph-based approaches have more recently garnered significant attention in ANNS algorithms, however, only a few studies have explored harnessing the power of GPUs and multi-core processors despite the widespread use of massively parallel and general-purpose computing. To bridge this gap, we introduce a novel parallel computing hardware-based proximity graph and search algorithm. By leveraging the high-performance capabilities of modern hardware, our approach achieves remarkable efficiency gains. In particular, our method surpasses existing CPU and GPU-based methods in constructing the proximity graph, demonstrating higher throughput in both large- and small-batch searches while maintaining compatible accuracy. In graph construction time, our method, CAGRA, is 2.2~27x faster than HNSW, which is one of the CPU SOTA implementations. In large-batch query throughput in the 90% to 95% recall range, our method is 33~77x faster than HNSW, and is 3.8~8.8x faster than the SOTA implementations for GPU. For a single query, our method is 3.4~53x faster than HNSW at 95% recall.
翻译:摘要:近似最近邻搜索(ANNS)在数据挖掘与人工智能的多个领域(从信息检索、计算机视觉到自然语言处理与推荐系统)中起着关键作用。近年来数据量急剧增长,穷举式精确最近邻搜索的计算成本往往高得难以承受,因此必须采用近似技术。基于图的方法在ANNS算法中凭借其均衡的性能与召回率近来备受关注,然而,尽管大规模并行与通用计算已广泛普及,但仅有少数研究探索了如何利用GPU及多核处理器的计算能力。为弥补这一空白,我们提出了一种基于并行计算硬件的新型近邻图构建与搜索算法。通过发挥现代硬件的高性能特性,我们的方法实现了显著的效率提升。具体而言,本方法在近邻图构建速度上超越了现有基于CPU与GPU的方案,在大批量与小批量搜索中均展现出更高吞吐量,同时保持可比的精度。在图构建时间方面,本方法CAGRA比当前CPU领域最优实现之一的HNSW快2.2至27倍;在90%至95%召回率范围内的大批量查询吞吐量上,本方法较HNSW快33至77倍,且比GPU领域现有最优实现快3.8至8.8倍;对于单条查询,本方法在95%召回率下较HNSW快3.4至53倍。