Approximate nearest-neighbour search underpins large-scale retrieval and retrieval-augmented generation, yet its methods are studied in communities that seldom read one another. We argue that they form one field with three design choices. We develop the projection-quantisation-organisation lens: every method places its projections, places its quantisation thresholds, and organises the resulting codes for search. We test the lens with a reproducible measurement, released as the open BitBudget benchmark, and report three findings. First, the quantisation axis delivers the largest memory savings: a one-bit code with full-precision re-ranking matches uncompressed quality for six of seven embedders, the scanned code one thirty-second of the float's size. Second, the orderings the lens anticipates, including a learned-embedding regime where binary codes overtake an inverted-file product quantiser at a matched byte budget, recur as the embedding is enlarged. Third, given class labels, an eight-byte supervised code more than doubles the retrieval quality of the two-kilobyte task-agnostic float it replaces. We also recast the semantic identifiers of generative retrieval as quantisation codes. The main contribution is a single, tested account of compact-code search, from random projections to the retrieval-augmented era.
翻译:近似最近邻搜索支撑着大规模检索与检索增强生成,但其方法却分散在极少互引的不同研究社群中。我们认为这些方法实则构成同一领域,仅存在三种设计选择。我们提出了"投影-量化-组织"分析框架:每种方法均需设定投影方式、量化阈值,并组织所得编码进行搜索。我们通过可复现的测量验证该框架(已开源为BitBudget基准测试),并报告三项发现:第一,量化轴能实现最大内存节省——对于七种嵌入器中的六种,使用全精度重排的一比特编码即可达到与未压缩编码相当的质量,而扫描编码的大小仅为浮点编码的三十二分之一;第二,该框架所预测的排序现象(包括在匹配字节预算时,二进制编码超越倒排文件乘积量化器的学习嵌入场景)会随嵌入维度增大而重复出现;第三,在给定类别标签的情况下,八字节有监督编码的检索质量是其所替代的两千字节任务无关浮点编码的两倍以上。我们还将生成式检索中的语义标识符重新诠释为量化编码。本文的主要贡献在于,为从随机投影到检索增强时代的紧凑编码搜索提供了统一的、经实验验证的理论框架。