Range-filtered approximate nearest neighbor (RFANN) search is a fundamental operation in modern data systems. Given a set of objects, each with a vector and a numerical attribute, an RFANN query retrieves the nearest neighbors to a query vector among those objects whose numerical attributes fall within the range specified by the query. Existing state-of-the-art methods for RFANN search often require constructing multiple range-specific graph indexes to achieve high query performance, which incurs significant indexing overhead. To address this, we first establish a novel graph indexing theory, the range-aware relative neighborhood graph (RRNG), which jointly considers spatial and attribute proximity. We prove that the RRNG satisfies two crucial properties: (1) monotonic search-ability, which ensures correct nearest neighbor retrieval via beam search; and (2) structural heredity, which guarantees that any range-induced subgraph remains a valid RRNG, thus enabling efficient search with a single graph index. Based on this theoretical foundation, we propose a new graph index called RNSG as a practical solution that efficiently approximates RRNG. We develop fast algorithms for both constructing the RNSG index and processing RFANN queries with it. Extensive experiments on five real-world datasets show that RNSG achieves significantly higher query performance with a more compact index and lower construction cost than existing state-of-the-art methods.
翻译:范围过滤近似最近邻搜索是现代数据系统中的一项基本操作。给定一组对象,每个对象包含一个向量和一个数值属性,RFANN查询在那些数值属性落在查询指定范围内的对象中,检索与查询向量最接近的邻居。现有的RFANN搜索先进方法通常需要构建多个特定范围的图索引以实现高查询性能,这带来了显著的索引开销。为解决此问题,我们首先建立了一种新颖的图索引理论——范围感知相对邻域图,该理论同时考虑了空间邻近性和属性邻近性。我们证明了RRNG满足两个关键性质:(1)单调搜索能力,确保通过束搜索能够正确检索最近邻;(2)结构遗传性,保证任何范围诱导子图仍是一个有效的RRNG,从而能够使用单一图索引实现高效搜索。基于此理论基础,我们提出了一种名为RNSG的新图索引作为实用解决方案,它能高效地近似RRNG。我们开发了用于构建RNSG索引以及使用该索引处理RFANN查询的快速算法。在五个真实世界数据集上的大量实验表明,与现有的先进方法相比,RNSG以更紧凑的索引和更低的构建成本实现了显著更高的查询性能。