Multi-vector retrieval models such as ColBERT [Khattab and Zaharia, 2020] allow token-level interactions between queries and documents, and hence achieve state of the art on many information retrieval benchmarks. However, their non-linear scoring function cannot be scaled to millions of documents, necessitating a three-stage process for inference: retrieving initial candidates via token retrieval, accessing all token vectors, and scoring the initial candidate documents. The non-linear scoring function is applied over all token vectors of each candidate document, making the inference process complicated and slow. In this paper, we aim to simplify the multi-vector retrieval by rethinking the role of token retrieval. We present XTR, ConteXtualized Token Retriever, which introduces a simple, yet novel, objective function that encourages the model to retrieve the most important document tokens first. The improvement to token retrieval allows XTR to rank candidates only using the retrieved tokens rather than all tokens in the document, and enables a newly designed scoring stage that is two-to-three orders of magnitude cheaper than that of ColBERT. On the popular BEIR benchmark, XTR advances the state-of-the-art by 2.8 nDCG@10 without any distillation. Detailed analysis confirms our decision to revisit the token retrieval stage, as XTR demonstrates much better recall of the token retrieval stage compared to ColBERT.
翻译:多向量检索模型(如ColBERT [Khattab and Zaharia, 2020])允许查询与文档之间进行令牌级别的交互,因此在众多信息检索基准测试中达到了最先进水平。然而,其非线性评分函数无法扩展到百万级文档,导致推理过程需要三个阶段:通过令牌检索获取初始候选文档、访问所有令牌向量、对初始候选文档进行评分。非线性评分函数需应用于每个候选文档的所有令牌向量,使得推理过程复杂且缓慢。本文旨在通过重新思考令牌检索的作用来简化多向量检索。我们提出XTR(ConteXtualized Token Retriever),引入一种简单而新颖的目标函数,鼓励模型优先检索最重要的文档令牌。令牌检索的改进使XTR仅需利用检索到的令牌(而非文档中的所有令牌)即可对候选文档进行排序,并支持一个比ColBERT成本低两到三个数量级的新设计的评分阶段。在流行的BEIR基准测试中,XTR在没有使用任何蒸馏的情况下将最先进水平提升了2.8个nDCG@10指标。详细分析证实了我们重新审视令牌检索阶段的决定,因为与ColBERT相比,XTR在令牌检索阶段展现出更优越的召回性能。