Nearest neighbor search on high-dimensional vectors is fundamental in modern AI and database systems. In many real-world applications, queries involve constraints on multiple numeric attributes, giving rise to range-filtering approximate nearest neighbor search (RFANNS). While there exist RFANNS indexes for single-attribute range predicates, extending them to the multi-attribute setting is nontrivial and often ineffective. In this paper, we propose KHI, an index for multi-attribute RFANNS that combines an attribute-space partitioning tree with HNSW graphs attached to tree nodes. A skew-aware splitting rule bounds the tree height by $O(\log n)$, and queries are answered by routing through the tree and running greedy search on the HNSW graphs. Experiments on four real-world datasets show that KHI consistently achieves high query throughput while maintaining high recall. Compared with the state-of-the-art RFANNS baseline, KHI improves QPS by $2.46\times$ on average and up to $16.22\times$ on the hard dataset, with larger gains for smaller selectivity, larger $k$, and higher predicate cardinality.
翻译:高维向量上的最近邻搜索是现代人工智能与数据库系统的基础。在许多实际应用中,查询涉及对多个数值属性的约束,从而催生了范围过滤近似最近邻搜索(RFANNS)。尽管目前已存在针对单属性范围谓词的RFANNS索引,但将其扩展至多属性场景具有显著挑战性且往往效果不佳。本文提出KHI——一种面向多属性RFANNS的索引结构,它将属性空间划分树与附着在树节点上的HNSW图相结合。通过采用偏斜感知分割规则将树高约束在$O(\log n)$,查询过程通过树结构路由并在HNSW图上执行贪婪搜索完成。在四个真实数据集上的实验表明,KHI在保持高召回率的同时始终实现高查询吞吐量。与最先进的RFANNS基线相比,KHI的平均QPS提升达$2.46\times$,在困难数据集上最高提升达$16.22\times$,且在更低选择率、更大$k$值及更高谓词基数条件下获得更显著的性能增益。