We study efficient multi-vector retrieval for late interaction in any modality. Late interaction has emerged as a dominant paradigm for information retrieval in text, images, visual documents, and videos, but its computation and storage costs grow linearly with document length, making it costly for image-, video-, and audio-rich corpora. To address this limitation, we explore query-agnostic methods for compressing multi-vector document representations under a constant vector budget. We introduce four approaches for index compression: sequence resizing, memory tokens, hierarchical pooling, and a novel attention-guided clustering (AGC). AGC uses an attention-guided mechanism to identify the most semantically salient regions of a document as cluster centroids and to weight token aggregation. Evaluating these methods on retrieval tasks spanning text (BEIR), visual-document (ViDoRe), and video (MSR-VTT, MultiVENT 2.0), we show that attention-guided clustering consistently outperforms other parameterized compression methods (sequence resizing and memory tokens), provides greater flexibility in index size than non-parametric hierarchical clustering, and achieves competitive or improved performance compared to a full, uncompressed index. The source code is available at: github.com/hanxiangqin/omni-col-press.
翻译:本文研究任意模态下用于延迟交互的高效多向量检索方法。延迟交互已成为文本、图像、视觉文档及视频信息检索的主流范式,但其计算与存储成本随文档长度线性增长,对富含图像、视频及音频的语料库而言代价高昂。为突破此限制,我们探索在固定向量预算下对多向量文档表示进行查询无关压缩的方法。我们提出了四种索引压缩方案:序列尺寸调整、记忆令牌、层次化池化以及新颖的注意力引导聚类(AGC)。AGC采用注意力引导机制识别文档中语义最显著的区域作为聚类中心,并对令牌聚合进行加权。通过在文本(BEIR)、视觉文档(ViDoRe)及视频(MSR-VTT, MultiVENT 2.0)检索任务上的评估,我们证明注意力引导聚类在性能上持续优于其他参数化压缩方法(序列尺寸调整与记忆令牌),在索引尺寸方面比非参数化层次聚类更具灵活性,且与完整未压缩索引相比实现了具有竞争力或更优的性能。源代码发布于:github.com/hanxiangqin/omni-col-press。