Multi-vector retrieval models such as ColBERT [Khattab and Zaharia, 2020] allow token-level interactions between queries and documents, and hence achieve state of the art on many information retrieval benchmarks. However, their non-linear scoring function cannot be scaled to millions of documents, necessitating a three-stage process for inference: retrieving initial candidates via token retrieval, accessing all token vectors, and scoring the initial candidate documents. The non-linear scoring function is applied over all token vectors of each candidate document, making the inference process complicated and slow. In this paper, we aim to simplify the multi-vector retrieval by rethinking the role of token retrieval. We present XTR, ConteXtualized Token Retriever, which introduces a simple, yet novel, objective function that encourages the model to retrieve the most important document tokens first. The improvement to token retrieval allows XTR to rank candidates only using the retrieved tokens rather than all tokens in the document, and enables a newly designed scoring stage that is two-to-three orders of magnitude cheaper than that of ColBERT. On the popular BEIR benchmark, XTR advances the state-of-the-art by 2.8 nDCG@10 without any distillation. Detailed analysis confirms our decision to revisit the token retrieval stage, as XTR demonstrates much better recall of the token retrieval stage compared to ColBERT.
翻译:多向量检索模型(如 ColBERT [Khattab and Zaharia, 2020])允许查询与文档之间进行令牌级交互,从而在许多信息检索基准测试中取得了最先进的结果。然而,其非线性评分函数无法扩展到数百万文档,因此推理过程需要三个阶段:通过令牌检索获取初始候选文档、访问所有令牌向量、并对初始候选文档进行评分。非线性评分函数应用于每个候选文档的所有令牌向量,导致推理过程复杂且缓慢。本文通过重新思考令牌检索的作用,旨在简化多向量检索。我们提出了 XTR(ConteXtualized Token Retriever),引入了一个简单而新颖的目标函数,鼓励模型优先检索最重要的文档令牌。令牌检索的改进使得 XTR 仅需使用检索到的令牌而非文档中的所有令牌即可对候选文档进行排序,并支持一种新设计的评分阶段,其计算成本比 ColBERT 低两到三个数量级。在流行的 BEIR 基准测试上,XTR 在没有采用任何蒸馏方法的情况下,将 nDCG@10 指标提升了 2.8 个点。详细分析证实了我们重新审视令牌检索阶段的决策,因为与 ColBERT 相比,XTR 在令牌检索阶段展现了更优的召回率。