Web-scale search systems learn an encoder to embed a given query which is then hooked into an approximate nearest neighbor search (ANNS) pipeline to retrieve similar data points. To accurately capture tail queries and data points, learned representations typically are rigid, high-dimensional vectors that are generally used as-is in the entire ANNS pipeline and can lead to computationally expensive retrieval. In this paper, we argue that instead of rigid representations, different stages of ANNS can leverage adaptive representations of varying capacities to achieve significantly better accuracy-compute trade-offs, i.e., stages of ANNS that can get away with more approximate computation should use a lower-capacity representation of the same data point. To this end, we introduce AdANNS, a novel ANNS design framework that explicitly leverages the flexibility of Matryoshka Representations. We demonstrate state-of-the-art accuracy-compute trade-offs using novel AdANNS-based key ANNS building blocks like search data structures (AdANNS-IVF) and quantization (AdANNS-OPQ). For example on ImageNet retrieval, AdANNS-IVF is up to 1.5% more accurate than the rigid representations-based IVF at the same compute budget; and matches accuracy while being up to 90x faster in wall-clock time. For Natural Questions, 32-byte AdANNS-OPQ matches the accuracy of the 64-byte OPQ baseline constructed using rigid representations -- same accuracy at half the cost! We further show that the gains from AdANNS translate to modern-day composite ANNS indices that combine search structures and quantization. Finally, we demonstrate that AdANNS can enable inference-time adaptivity for compute-aware search on ANNS indices built non-adaptively on matryoshka representations. Code is open-sourced at https://github.com/RAIVNLab/AdANNS.
翻译:网络规模的搜索系统通过训练编码器将给定查询嵌入向量,随后接入近似最近邻搜索(ANNS)流程以检索相似数据点。为准确捕获长尾查询与数据点,学得的表示通常是刚性高维向量,并在整个ANNS流程中直接使用,这可能导致计算成本高昂。本文提出:不同ANNS阶段可摒弃刚性表示,转而利用不同容量的自适应表示,以实现显著的精度-计算权衡——即ANNS中能够容忍更多近似计算的阶段,应对同一数据点采用更低容量的表示。基于此,我们提出AdANNS——一种新型ANNS设计框架,显式利用Matryoshka表示的灵活性。通过基于AdANNS的关键ANNS构建模块(如搜索数据结构AdANNS-IVF与量化方法AdANNS-OPQ),我们展示了最先进的精度-计算权衡。例如,在ImageNet检索任务中,同等计算预算下AdANNS-IVF比基于刚性表示的IVF精度最高提升1.5%;在匹配精度的同时,墙钟时间最高加速90倍。在Natural Questions数据集上,32字节AdANNS-OPQ实现了与基于刚性表示的64字节OPQ基线相当的精度——相同精度,成本减半。我们进一步证明,AdANNS带来的增益可迁移至结合搜索结构与量化的现代复合ANNS索引。最后,我们证明AdANNS能在基于Matryoshka表示非自适应构建的ANNS索引上,实现推理时自适应计算感知检索。代码已开源至https://github.com/RAIVNLab/AdANNS。