Efficient Approximate Nearest Neighbor Search under Multi-Attribute Range Filter

Nearest neighbor search on high-dimensional vectors is fundamental in modern AI and database systems. In many real-world applications, queries involve constraints on multiple numeric attributes, giving rise to range-filtering approximate nearest neighbor search (RFANNS). While there exist RFANNS indexes for single-attribute range predicates, extending them to the multi-attribute setting is nontrivial and often ineffective. In this paper, we propose KHI, an index for multi-attribute RFANNS that combines an attribute-space partitioning tree with HNSW graphs attached to tree nodes. A skew-aware splitting rule bounds the tree height by $O(\log n)$, and queries are answered by routing through the tree and running greedy search on the HNSW graphs. Experiments on four real-world datasets show that KHI consistently achieves high query throughput while maintaining high recall. Compared with the state-of-the-art RFANNS baseline, KHI improves QPS by $2.46\times$ on average and up to $16.22\times$ on the hard dataset, with larger gains for smaller selectivity, larger $k$, and higher predicate cardinality.

翻译：高维向量上的最近邻搜索是现代人工智能与数据库系统的基础。在许多实际应用中，查询涉及对多个数值属性的约束，从而催生了范围过滤近似最近邻搜索（RFANNS）。尽管目前已存在针对单属性范围谓词的RFANNS索引，但将其扩展至多属性场景具有显著挑战性且往往效果不佳。本文提出KHI——一种面向多属性RFANNS的索引结构，它将属性空间划分树与附着在树节点上的HNSW图相结合。通过采用偏斜感知分割规则将树高约束在$O(\log n)$，查询过程通过树结构路由并在HNSW图上执行贪婪搜索完成。在四个真实数据集上的实验表明，KHI在保持高召回率的同时始终实现高查询吞吐量。与最先进的RFANNS基线相比，KHI的平均QPS提升达$2.46\times$，在困难数据集上最高提升达$16.22\times$，且在更低选择率、更大$k$值及更高谓词基数条件下获得更显著的性能增益。

相关内容

属性

关注 2

一个具体事物，总是有许许多多的性质与关系，我们把一个事物的性质与关系，都叫作事物的属性。事物与属性是不可分的，事物都是有属性的事物，属性也都是事物的属性。一个事物与另一个事物的相同或相异，也就是一个事物的属性与另一事物的属性的相同或相异。由于事物属性的相同或相异，客观世界中就形成了许多不同的事物类。具有相同属性的事物就形成一类，具有不同属性的事物就分别地形成不同的类。

【KDD2023】协同过滤的高效联合超参数和架构搜索

专知会员服务

23+阅读 · 2023年7月23日

【TPAMI2022】关联关系驱动的多模态分类，AF: An Association-based Fusion Method for Multi-Modal Classification

专知会员服务

27+阅读 · 2022年3月22日

【深度神经网络加速器的硬件近似技术综述】Hardware Approximate Techniques for Deep Neural Network Accelerators: A Survey

专知会员服务

16+阅读 · 2022年3月17日

【NeurIPS2021】上亿量级规模高效向量近似最近邻搜索系统 SPANN

专知会员服务

11+阅读 · 2021年11月17日