While multi-vector retrieval models outperform single-vector models of comparable size in retrieval quality, their practicality is limited by substantially larger index sizes, driven by the additional sequence-length dimension in their document embeddings. Because document embedding size dictates both memory overhead and query latency, compression is essential for deployment. In this work, we present an evaluation of training-free methods targeting the token sequence length, a dimension unique to multi-vector retrieval. Our findings suggest that token merging is strictly superior to token pruning for reducing index size while maintaining retrieval effectiveness.
翻译:多向量检索模型在检索质量上优于同等规模的单向量模型,但其文档嵌入中额外的序列长度维度会导致索引规模显著增大,从而限制了实际应用。由于文档嵌入大小直接影响内存开销和查询延迟,压缩对于部署至关重要。本研究针对多向量检索特有的标记序列长度维度,评估了免训练压缩方法。我们的研究结果表明,在保持检索效果的同时减少索引规模方面,标记合并严格优于标记剪枝。