Despite the efficacy of graph-based algorithms for Approximate Nearest Neighbor (ANN) searches, the optimal tuning of such systems remains unclear. This study introduces a method to tune the performance of off-the-shelf graph-based indexes, focusing on the dimension of vectors, database size, and entry points of graph traversal. We utilize a black-box optimization algorithm to perform integrated tuning to meet the required levels of recall and Queries Per Second (QPS). We applied our approach to Task A of the SISAP 2023 Indexing Challenge and got second place in the 10M and 30M tracks. It improves performance substantially compared to brute force methods. This research offers a universally applicable tuning method for graph-based indexes, extending beyond the specific conditions of the competition to broader uses.
翻译:尽管基于图的近似最近邻(ANN)搜索算法具有高效性,但此类系统的最优调优方法仍不明确。本研究提出了一种针对现成图索引性能的调优方法,重点关注向量维度、数据库规模以及图遍历的入口点。我们采用黑盒优化算法进行集成调优,以满足所需的召回率和每秒查询次数(QPS)指标。我们将该方法应用于SISAP 2023索引挑战赛的任务A,并在1000万和3000万规模赛道中荣获第二名。与暴力搜索方法相比,该方法显著提升了性能。本研究提供了一种适用于图索引的通用调优方法,其应用范围可超越该竞赛的特定条件,拓展至更广泛的场景。