Nowadays, data is represented by vectors. Retrieving those vectors, among millions and billions, that are similar to a given query is a ubiquitous problem of relevance for a wide range of applications. In this work, we present new techniques for creating faster and smaller indices to run these searches. To this end, we introduce a novel vector compression method, Locally-adaptive Vector Quantization (LVQ), that simultaneously reduces memory footprint and improves search performance, with minimal impact on search accuracy. LVQ is designed to work optimally in conjunction with graph-based indices, reducing their effective bandwidth while enabling random-access-friendly fast similarity computations. Our experimental results show that LVQ, combined with key optimizations for graph-based indices in modern datacenter systems, establishes the new state of the art in terms of performance and memory footprint. For billions of vectors, LVQ outcompetes the second-best alternatives: (1) in the low-memory regime, by up to 20.7x in throughput with up to a 3x memory footprint reduction, and (2) in the high-throughput regime by 5.8x with 1.4x less memory.
翻译:如今,数据以向量形式表示。从数百万乃至数十亿个向量中检索与给定查询相似的向量,是众多应用中普遍存在的重要问题。本文提出了构建更快、更小索引以执行此类搜索的新技术。为此,我们引入了一种新颖的向量压缩方法——局部自适应向量量化(Locally-adaptive Vector Quantization, LVQ),该方法在显著降低内存占用并提升搜索性能的同时,对搜索准确性的影响微乎其微。LVQ专为与基于图的索引协同工作而优化,能够降低其有效带宽,并支持快速相似性计算的随机访问。实验结果表明,LVQ结合现代数据中心系统中针对图索引的关键优化技术,在性能和内存占用方面确立了新的最优水平。针对数十亿级向量,LVQ在低内存模式下的吞吐量最高提升至次优方案的20.7倍,同时内存占用减少多达3倍;在高吞吐模式下,吞吐量提升5.8倍,内存占用减少1.4倍。