Dynamically Detect and Fix Hardness for Efficient Approximate Nearest Neighbor Search

Approximate Nearest Neighbor Search (ANNS) has become a fundamental component in many real-world applications. Among various ANNS algorithms, graph-based methods are state-of-the-art. However, ANNS often suffers from a significant drop in accuracy for certain queries, especially in Out-of-Distribution (OOD) scenarios. To address this issue, a recent approach named RoarGraph constructs a bipartite graph between the base data and historical queries to bridge the gap between two different distributions. However, it suffers from some limitations: (1) Building a bipartite graph between two distributions lacks theoretical support, resulting in the query distribution not being effectively utilized by the graph index. (2) Requires a sufficient number of historical queries before graph construction and suffers from high construction times. (3) When the query workload changes, it requires reconstruction to maintain high search accuracy. In this paper, we first propose Escape Hardness, a metric to evaluate the quality of the graph structure around the query. Then we divide the graph search into two stages and dynamically identify and fix defective graph regions in each stage based on Escape Hardness. (1) From the entry point to the vicinity of the query. We propose Reachability Fixing (RFix), which enhances the navigability of some key nodes. (2) Searching within the vicinity of the query. We propose Neighboring Graph Defects Fixing (NGFix) to improve graph connectivity in regions where queries are densely distributed. The results of extensive experiments show that our method outperforms other state-of-the-art methods on real-world datasets, achieving up to 2.25x faster search speed for OOD queries at 99% recall compared with RoarGraph and 6.88x faster speed compared with HNSW. It also accelerates index construction by 2.35-9.02x compared to RoarGraph.

翻译：近似最近邻搜索已成为许多实际应用中的基础组件。在图基方法作为当前最优技术的各类ANNS算法中，搜索精度常因特定查询（尤其在分布外场景下）出现显著下降。近期提出的RoarGraph方法通过在基础数据与历史查询间构建二分图来弥合分布差异，但仍存在以下局限：（1）双分布间二分图构建缺乏理论支撑，导致图索引未能有效利用查询分布；（2）建图前需积累足量历史查询且构建耗时较高；（3）查询负载变化时需重建索引以维持高搜索精度。本文首先提出逃逸困难度指标，用于评估查询周边图结构质量；进而将图搜索分为两个阶段，基于该指标动态识别并修复各阶段的缺陷图区域：（1）从入口点到查询邻近区域，提出可达性修复方法以增强关键节点的导航能力；（2）在查询邻近区域内搜索时，提出邻域图缺陷修复方法以提升查询密集区域的图连通性。大量实验结果表明，本方法在真实数据集上优于现有最优方法：在99%召回率条件下，对OOD查询的搜索速度较RoarGraph提升2.25倍，较HNSW提升6.88倍；索引构建速度较RoarGraph提升2.35-9.02倍。