Vector similarity search is an essential primitive in modern AI and ML applications. Most vector databases adopt graph-based approximate nearest neighbor (ANN) search algorithms, such as DiskANN (Subramanya et al., 2019), which have demonstrated state-of-the-art empirical performance. DiskANN's graph construction is governed by a reachability parameter $α$, which gives a trade-off between construction time, query time, and accuracy. However, adaptively tuning this trade-off typically requires rebuilding the index for different $α$ values, which is prohibitive at scale. In this work, we propose RP-Tuning, an efficient post-hoc routine, based on DiskANN's pruning step, to adjust the $α$ parameter without reconstructing the full index. Within the $α$-reachability framework of prior theoretical works (Indyk and Xu, 2023; Gollapudi et al., 2025), we prove that pruning an initially $α$-reachable graph with RP-Tuning preserves worst-case reachability guarantees in general metrics and improved guarantees in Euclidean metrics. Empirically, we show that RP-Tuning accelerates DiskANN tuning on four public datasets by up to $43\times$ with negligible overhead.
翻译:向量相似性搜索是现代人工智能与机器学习应用中的一项基本操作。大多数向量数据库采用基于图的近似最近邻搜索算法,例如DiskANN(Subramanya等人,2019),该算法已展现出最先进的实证性能。DiskANN的图构建受可达性参数α控制,该参数在构建时间、查询时间与精度之间提供了权衡。然而,自适应调整这一权衡通常需要为不同的α值重建索引,这在规模应用时成本过高。在本工作中,我们提出了RP-Tuning,一种基于DiskANN剪枝步骤的高效事后调整例程,无需重建完整索引即可调整α参数。在先验理论工作的α可达性框架内(Indyk与Xu,2023;Gollapudi等人,2025),我们证明了对初始α可达图使用RP-Tuning进行剪枝,可在一般度量空间中保持最坏情况下的可达性保证,并在欧几里得度量空间中提供改进的保证。实证结果表明,RP-Tuning在四个公共数据集上将DiskANN的调整速度提升了高达43倍,且开销可忽略不计。