BBC: Improving Large-k Approximate Nearest Neighbor Search with a Bucket-based Result Collector

Although Approximate Nearest Neighbor (ANN) search has been extensively studied, large-k ANN queries that aim to retrieve a large number of nearest neighbors remain underexplored, despite their numerous real-world applications. Existing ANN methods face significant performance degradation for such queries. In this work, we first investigate the reasons for the performance degradation of quantization-based ANN indexes: (1) the inefficiency of existing top-k collectors, which incurs significant overhead in candidate maintenance, and (2) the reduced pruning effectiveness of quantization methods, which leads to a costly re-ranking process. To address this, we propose a novel bucket-based result collector (BBC) to enhance the efficiency of existing quantization-based ANN indexes for large-k ANN queries. BBC introduces two key components: (1) a bucket-based result buffer that organizes candidates into buckets by their distances to the query. This design reduces ranking costs and improves cache efficiency, enabling high performance maintenance of a candidate superset and a lightweight final selection of top-k results. (2) two re-ranking algorithms tailored for different types of quantization methods, which accelerate their re-ranking process by reducing either the number of candidate objects to be re-ranked or cache misses. Extensive experiments on real-world datasets demonstrate that BBC accelerates existing quantization-based ANN methods by up to 3.8x at recall@k = 0.95 for large-k ANN queries.

翻译：虽然近似最近邻（ANN）搜索已被广泛研究，但旨在检索大量最近邻的大规模k ANN查询在实际应用中虽频繁出现，却仍未被充分探索。现有ANN方法在处理此类查询时性能显著下降。本研究首先探究了基于量化的ANN索引性能衰退的原因：（1）现有top-k收集器的低效性导致候选维护开销过大，（2）量化方法的剪枝效果减弱引发高代价的重排序过程。为此，我们提出新型的基于桶的结果收集器（BBC），以提升现有基于量化的ANN索引在大规模k ANN查询中的效率。BBC引入两个关键组件：（1）基于桶的结果缓冲区，根据候选对象与查询的距离将其组织到不同桶中。该设计降低了排序成本并改善了缓存效率，从而实现候选超集的高性能维护和最终top-k结果的轻量级选择；（2）两种针对不同类型量化方法定制的重排序算法，通过减少待重排候选对象数量或缓存缺失次数来加速重排序过程。在真实数据集上的大量实验表明，对于大规模k ANN查询，当recall@k = 0.95时，BBC可将现有基于量化的ANN方法加速最高达3.8倍。

相关内容

BBC

关注 5

英国广播公司（英文简称：BBC, 英文名称；British Broadcasting Corporation）成立于1922年，总部位于英国伦敦，前身为British Broadcasting Company，是英国最大的新闻广播机构，也是世界最大的新闻广播机构之一。 BBC于1936年开始提供电视服务，是世界上第一家电视台。1967年，BBC首次采用彩色信号播报温布尔登网球公开赛，从而开启了彩色电视时代。 [1] 今天BBC除了是一家在全球拥有高知名度和广泛信誉的媒体，还经营着其他业务，包括BBC Proms音乐会、英语教学、交响乐团等

【斯坦福博士论文】超越最大似然估计：分布感知机器学习

专知会员服务

30+阅读 · 2024年9月7日

【AAAI2024】Wikiformer: 利用维基百科结构化信息进行预训练，用于Ad-hoc检索

专知会员服务

19+阅读 · 2023年12月26日

【NeurIPS2021】上亿量级规模高效向量近似最近邻搜索系统 SPANN

专知会员服务

11+阅读 · 2021年11月17日

【知乎】超越Lexical:用于文本搜索引擎的语义检索框架

专知会员服务

22+阅读 · 2020年8月28日