k-approximate nearest neighbor search (k-ANNS) in high-dimensional vector spaces is a fundamental problem across many fields. With the advent of vector databases and retrieval-augmented generation, k-ANNS has garnered increasing attention. Among existing methods, proximity graphs (PG) based approaches are the state-of-the-art (SOTA) methods. However, the construction parameters of PGs significantly impact their search performance. Before constructing a PG for a given dataset, it is essential to tune these parameters, which first recommends a set of promising parameters and then estimates the quality of each parameter by building the corresponding PG and then testing its k-ANNS performance. Given that the construction complexity of PGs is superlinear, building and evaluating graph indexes accounts for the primary cost of parameter tuning. Unfortunately, there is currently no method considered and optimized this process.In this paper, we introduce FastPGT, an efficient framework for tuning the PG construction parameters. FastPGT accelerates parameter estimation by building multiple PGs simultaneously, thereby reducing repeated computations. Moreover, we modify the SOTA tuning model to recommend multiple parameters at once, which can be efficiently estimated using our method of building multiple PGs simultaneously. Through extensive experiments on real-world datasets, we demonstrate that FastPGT achieves up to 2.37x speedup over the SOTA method VDTuner, without compromising tuning quality.
翻译:高维向量空间中的k近似最近邻搜索(k-ANNS)是众多领域中的一个基础性问题。随着向量数据库和检索增强生成技术的兴起,k-ANNS受到了越来越多的关注。在现有方法中,基于邻近图(PG)的方法代表了当前最先进的(SOTA)技术。然而,PG的构建参数对其搜索性能有显著影响。在为给定数据集构建PG之前,对这些参数进行调优至关重要,该过程通常首先推荐一组有前景的参数,然后通过构建相应的PG并测试其k-ANNS性能来评估每个参数的质量。鉴于PG的构建复杂度是超线性的,构建和评估图索引占据了参数调优的主要成本。遗憾的是,目前尚无方法充分考虑并优化这一过程。本文提出FastPGT,一个用于调优PG构建参数的高效框架。FastPGT通过同时构建多个PG来加速参数估计,从而减少重复计算。此外,我们改进了SOTA调优模型,使其能够一次性推荐多个参数,这些参数可以利用我们同时构建多个PG的方法进行高效评估。通过在真实数据集上进行的大量实验,我们证明FastPGT在不牺牲调优质量的前提下,相比SOTA方法VDTuner实现了最高达2.37倍的加速。