Multi-vector retrieval (MVR) models, exemplified by ColBERT, have established new benchmarks in retrieval accuracy by preserving fine-grained token-level interactions. However, this granularity imposes prohibitive storage and retrieval efficiency bottlenecks: to manage the immense memory footprint and computational overhead of billion-scale token vectors, state-of-the-art systems are forced to rely on aggressive dimension reduction and complex clustering (e.g., K-means). This compromise introduces two critical limitations: excessive indexing latency of clustering large-scale corpora and semantic information loss inherent to compression. In this paper, we propose Single-stage Sparse Retrieval (SSR}, a paradigm shift that replaces expensive clustering with efficient sparse coding. Instead of compressing features into low-dimensional dense vectors, we utilize Sparse Autoencoder (SAE) to project token embeddings into a high-dimensional but highly sparse representation. This transformation enables us to bypass vector clustering entirely and leverage inverted indexing for precise, high-throughput retrieval. Extensive experiments on the BEIR benchmark demonstrate that SSR achieves a "trifecta" of improvements: it reduces indexing time by 15x compared to ColBERTv2, halves retrieval latency, and simultaneously improves retrieval performance over leading baselines.
翻译:多向量检索模型(以ColBERT为代表)通过保留细粒度词元级交互,在检索精度上树立了新标杆。然而,这种粒度带来了存储与检索效率的严重瓶颈:为管理十亿级词元向量的庞大内存占用和计算开销,最先进的系统被迫依赖激进的维度压缩与复杂聚类(如K-means)。这种折衷方案引入两大关键局限:大规模语料库聚类导致的索引延迟过高,以及压缩过程中固有的语义信息丢失。本文提出单阶段稀疏检索范式SSR,以高效稀疏编码取代昂贵的聚类。我们利用稀疏自编码器将词元嵌入投影至高维但高度稀疏的表示空间,而非将特征压缩至低维稠密向量。这一转换使我们能完全绕过向量聚类,直接利用倒排索引实现精准、高吞吐量的检索。在BEIR基准上的大量实验表明,SSR实现了三重改进:相较于ColBERTv2,索引时间减少15倍,检索延迟减半,同时在检索性能上超越主流基线。