Dense retrieval systems have proven to be effective across various benchmarks, but require substantial memory to store large search indices. Recent advances in embedding compression show that index sizes can be greatly reduced with minimal loss in ranking quality. However, existing studies often overlook the role of corpus complexity -- a critical factor, as recent work shows that both corpus size and document length strongly affect dense retrieval performance. In this paper, we introduce CoRECT (Controlled Retrieval Evaluation of Compression Techniques), a framework for large-scale evaluation of embedding compression methods, supported by a newly curated dataset collection. To demonstrate its utility, we benchmark eight representative types of compression methods. Notably, we show that non-learned compression achieves substantial index size reduction, even on up to 100M passages, with statistically insignificant performance loss. However, selecting the optimal compression method remains challenging, as performance varies across models. Such variability highlights the necessity of CoRECT to enable consistent comparison and informed selection of compression methods. All code, data, and results are available on GitHub and HuggingFace.
翻译:密集检索系统已在多种基准测试中证明其有效性,但需要大量内存来存储庞大的搜索索引。嵌入压缩技术的最新进展表明,索引大小可在排序质量损失最小的情况下大幅缩减。然而,现有研究往往忽视了语料库复杂性的作用——这一关键因素尤为重要,因为近期研究表明语料库规模和文档长度均会显著影响密集检索性能。本文提出CoRECT(受控检索的压缩技术评估框架),这是一个用于大规模评估嵌入压缩方法的框架,并辅以新构建的数据集集合。为验证其效用,我们对八种代表性压缩方法进行了基准测试。值得注意的是,我们发现非学习型压缩方法即使在一亿段落规模上也能实现显著的索引大小缩减,且性能损失在统计上不显著。然而,选择最优压缩方法仍具挑战性,因为不同模型间的性能表现存在差异。这种差异性凸显了CoRECT框架的必要性——它能实现压缩方法的一致化比较与知情选择。所有代码、数据及结果均已发布于GitHub和HuggingFace平台。