Range-filtered approximate nearest neighbor (RFANN) search is a fundamental operation in modern data systems. Given a set of objects, each with a vector and a numerical attribute, an RFANN query retrieves the nearest neighbors to a query vector among those objects whose numerical attributes fall within the range specified by the query. Existing state-of-the-art methods for RFANN search often require constructing multiple range-specific graph indexes to achieve high query performance, which incurs significant indexing overhead. To address this, we first establish a novel graph indexing theory, the range-aware relative neighborhood graph (RRNG), which jointly considers spatial and attribute proximity. We prove that the RRNG satisfies two crucial properties: (1) monotonic search-ability, which ensures correct nearest neighbor retrieval via beam search; and (2) structural heredity, which guarantees that any range-induced subgraph remains a valid RRNG, thus enabling efficient search with a single graph index. Based on this theoretical foundation, we propose a new graph index called RNSG as a practical solution that efficiently approximates RRNG. We develop fast algorithms for both constructing the RNSG index and processing RFANN queries with it. Extensive experiments on five real-world datasets show that RNSG achieves significantly higher query performance with a more compact index and lower construction cost than existing state-of-the-art methods.
翻译:范围过滤近似最近邻(RFANN)搜索是现代数据系统中的基本操作。给定一组对象,每个对象包含一个向量和一个数值属性,RFANN查询从数值属性落在查询指定范围内的对象中,检索与查询向量最近的邻居。现有RFANN搜索的最先进方法通常需要构建多个特定范围的图索引才能实现高查询性能,这会导致显著的索引开销。为解决这一问题,我们首先建立了一种新颖的图索引理论,即范围感知相对邻近图(RRNG),该理论联合考虑了空间和属性的邻近性。我们证明了RRNG满足两个关键性质:(1) 单调可搜索性,确保通过波束搜索正确检索最近邻;(2) 结构遗传性,保证任何范围诱导的子图仍然是有效的RRNG,从而支持使用单一图索引进行高效搜索。基于这一理论基础,我们提出了一种名为RNSG的新型图索引,作为高效近似RRNG的实用解决方案。我们开发了用于构建RNSG索引以及使用其处理RFANN查询的快速算法。在五个真实世界数据集上的大量实验表明,与现有最先进方法相比,RNSG以更紧凑的索引和更低的构建成本实现了显著更高的查询性能。