With the advancement of information retrieval, recommendation systems, and Retrieval-Augmented Generation (RAG), Approximate Nearest Neighbor Search (ANNS) gains widespread applications due to its higher performance and accuracy. While several disk-based ANNS systems have emerged to handle exponentially growing vector datasets, they suffer from suboptimal performance due to two inherent limitations: 1) failing to overlap SSD accesses with distance computation processes and 2) extended I/O latency caused by suboptimal I/O Stack. To address these challenges, we present FlashANNS, a GPU-accelerated out-of-core graph-based ANNS system through I/O-compute overlapping. Our core insight lies in the synchronized orchestration of I/O and computation through three key innovations: 1) Dependency-Relaxed asynchronous pipeline: FlashANNS decouples I/O-computation dependencies to fully overlap between GPU distance calculations and SSD data transfers. 2) Warp-Level concurrent SSD access: FlashANNS implements a lock-free I/O stack with warp-level concurrency control, to reduce the latency-induced time overhead. 3) Computation-I/O balanced graph degree Selection: FlashANNS selects graph degrees via lightweight compute-to-I/O ratio sampling, ensuring optimal balance between computational load and storage access latency across different I/O bandwidth configurations. We implement FlashANNS and compare it with state-of-the-art out-of-core ANNS systems (SPANN, DiskANN) and a GPU-accelerated out-of-core ANNS system (FusionANNS). Experimental results demonstrate that at $\geq$95\% recall@10 accuracy, our method achieves 2.3-5.9$\times$ higher throughput compared to existing SOTA methods with a single SSD, and further attains 2.7-12.2$\times$ throughput improvement in multi-SSD configurations.
翻译:随着信息检索、推荐系统和检索增强生成(RAG)的发展,近似最近邻搜索(ANNS)因其更高的性能和准确性而获得广泛应用。虽然已出现若干基于磁盘的ANNS系统以处理指数级增长的向量数据集,但由于两个固有局限,其性能表现欠佳:1)未能实现SSD访问与距离计算过程的重叠;2)次优的I/O栈导致I/O延迟延长。为应对这些挑战,我们提出了FlashANNS,一个通过I/O-计算重叠实现GPU加速的、基于图的核外ANNS系统。我们的核心洞见在于通过三项关键创新实现I/O与计算的同步编排:1)依赖松弛的异步流水线:FlashANNS解耦I/O-计算依赖,以充分重叠GPU距离计算与SSD数据传输。2)Warp级并发SSD访问:FlashANNS实现了一个具有warp级并发控制的无锁I/O栈,以减少延迟引起的时间开销。3)计算-I/O均衡的图度数选择:FlashANNS通过轻量级的计算-I/O比率采样选择图度数,确保在不同I/O带宽配置下计算负载与存储访问延迟之间的最优平衡。我们实现了FlashANNS,并将其与最先进的核外ANNS系统(SPANN、DiskANN)以及一个GPU加速的核外ANNS系统(FusionANNS)进行比较。实验结果表明,在召回率@10精度≥95%时,我们的方法在单SSD配置下相比现有SOTA方法实现了2.3-5.9倍的吞吐量提升,并在多SSD配置下进一步获得了2.7-12.2倍的吞吐量提升。