Nowadays, data is represented by vectors. Retrieving those vectors, among millions and billions, that are similar to a given query is a ubiquitous problem, known as similarity search, of relevance for a wide range of applications. Graph-based indices are currently the best performing techniques for billion-scale similarity search. However, their random-access memory pattern presents challenges to realize their full potential. In this work, we present new techniques and systems for creating faster and smaller graph-based indices. To this end, we introduce a novel vector compression method, Locally-adaptive Vector Quantization (LVQ), that uses per-vector scaling and scalar quantization to improve search performance with fast similarity computations and a reduced effective bandwidth, while decreasing memory footprint and barely impacting accuracy. LVQ, when combined with a new high-performance computing system for graph-based similarity search, establishes the new state of the art in terms of performance and memory footprint. For billions of vectors, LVQ outcompetes the second-best alternatives: (1) in the low-memory regime, by up to 20.7x in throughput with up to a 3x memory footprint reduction, and (2) in the high-throughput regime by 5.8x with 1.4x less memory.
翻译:如今,数据以向量形式表示。从数百万乃至数十亿的向量中检索与给定查询相似的向量是一个普遍存在的问题,被称为相似性搜索,广泛应用于各类场景。基于图的索引是目前处理十亿级相似性搜索最有效的技术。然而,其随机访问内存模式限制了其潜能的充分发挥。本文提出了创建更快、更小图索引的新技术与系统。为此,我们引入了一种新型向量压缩方法——局部自适应向量量化(Locally-adaptive Vector Quantization, LVQ),该方法通过逐向量缩放与标量量化,在降低内存占用且几乎不影响精度的前提下,利用快速相似性计算与缩减的有效带宽提升了搜索性能。LVQ与新的高性能图索引搜索系统相结合,在性能与内存占用方面确立了新的领先水平。对于数十亿向量,LVQ在以下方面优于次优替代方案:(1)在低内存模式下,吞吐量提升最高达20.7倍,内存占用减少最多3倍;(2)在高吞吐模式下,吞吐量提升5.8倍,内存占用减少1.4倍。