Approximate $k$ nearest neighbor (AKNN) search in high-dimensional space is a foundational problem in vector databases with widespread applications. Among the numerous AKNN indexes, Proximity Graph-based indexes achieve state-of-the-art search efficiency across various benchmarks. However, their extensive distance computations of high-dimensional vectors lead to slow construction and substantial memory overhead. The limited memory capacity often prevents building the entire index at once when handling large-scale datasets. A common practice is to build multiple sub-indexes separately. However, directly searching on these separated indexes severely compromises search efficiency, as queries cannot leverage cross-graph connections. Therefore, efficient graph index merging is crucial for multi-index searching. In this paper, we focus on efficient two-index merging and the merge order of multiple indexes for AKNN search. To achieve this, we propose a reverse neighbor sliding merge (RNSM) that exploits structural information to boost merging efficiency. We further investigate merge order selection (MOS) to reduce the merging cost by eliminating redundant merge operations. Experiments show that our approach yields up to a 5.48$\times$ speedup over existing index merge methods and 9.92$\times$ speedup over index reconstruction, while maintaining expected superior search performance. Moreover, our method scales efficiently to 100 million vectors with 50 partitions, maintaining consistent speedups.
翻译:高维空间中的近似$k$最近邻(AKNN)搜索是向量数据库中的一个基础性问题,具有广泛的应用。在众多的AKNN索引中,基于邻近图的索引在各种基准测试中实现了最先进的搜索效率。然而,其广泛的高维向量距离计算导致了构建速度缓慢和巨大的内存开销。在处理大规模数据集时,有限的内存容量通常阻碍了一次性构建整个索引。常见的做法是分别构建多个子索引。然而,直接在这些分离的索引上进行搜索会严重损害搜索效率,因为查询无法利用跨图的连接。因此,高效的图索引融合对于多索引搜索至关重要。在本文中,我们专注于AKNN搜索中高效的双索引融合以及多索引的融合顺序。为此,我们提出了一种利用结构信息提升融合效率的反向邻居滑动融合(RNSM)方法。我们进一步研究了融合顺序选择(MOS),通过消除冗余的融合操作来降低融合成本。实验表明,我们的方法相比现有的索引融合方法实现了高达5.48$\times$的加速,相比索引重建实现了9.92$\times$的加速,同时保持了预期的优越搜索性能。此外,我们的方法能够高效扩展到具有50个分区的1亿个向量,并保持一致的加速效果。