Approximate Nearest Neighbor Search (ANNS) over high-dimensional vectors is a foundational problem in databases, where disk I/O often emerges as the dominant performance bottleneck at scale. To accelerate search, graph-based indexes rely on proximity graph, where nodes represent vectors and edges guide the traversal toward the target. However, existing graph indexing solutions for disk-based ANNS typically either optimize the storage layout for a given graph or construct the graph independently of the storage layout, thus overlooking their interaction. In this paper, we bridge this gap by proposing the Block-aware Monotonic Relative Neighborhood Graph (BMRNG), theoretically guaranteeing the existence of I/O monotonic search paths. The core idea is to align the graph topology with the data placement by jointly considering both geometric distance and storage layout for edge selection. To address the scalability challenge of BMRNG construction, we further develop a practical and efficient variant, the Block-Aware Monotonic Graph (BAMG), which can be constructed in linear time from a monotonic graph considering the storage layout. BAMG integrates block-aware edge pruning with a decoupled storage design that separates raw vectors from the graph index, thereby maximizing block utilization and minimizing redundant disk reads. Additionally, we design a multi-layer navigation graph for adaptive and efficient query entry, along with a block-first search algorithm that prioritizes intra-block traversal to fully exploit each disk I/O operation. Extensive experiments on real-world datasets show that BAMG can outperform state-of-the-art methods in search performance.
翻译:高维向量近似最近邻搜索是数据库领域的基础问题,在规模扩展时磁盘I/O往往成为主要性能瓶颈。为加速搜索,基于图的索引依赖邻近图结构,其中节点表示向量,边引导遍历过程朝向目标向量。然而,现有面向磁盘近似最近邻搜索的图索引方案通常仅针对给定图优化存储布局,或独立于存储布局构建图结构,忽视了二者间的相互作用。本文通过提出块感知单调相对邻域图来弥合这一鸿沟,该结构从理论上保证了I/O单调搜索路径的存在性。其核心思想是通过联合考虑几何距离与存储布局进行边选择,使图拓扑结构与数据排布对齐。为解决BMRNG构建的可扩展性挑战,我们进一步开发了实用高效的变体——块感知单调图,该索引可在考虑存储布局的情况下从单调图以线性时间构建。BAMG将块感知边剪枝与解耦存储设计相结合,分离原始向量与图索引数据,从而最大化块利用率并最小化冗余磁盘读取。此外,我们设计了用于自适应高效查询入口的多层导航图,以及优先执行块内遍历以充分利用每次磁盘I/O操作的块优先搜索算法。在真实数据集上的大量实验表明,BAMG在搜索性能上优于现有先进方法。