Multi-vector representations generated by late interaction models, such as ColBERT, enable superior retrieval quality compared to single-vector representations in information retrieval applications. In multi-vector retrieval systems, both queries and documents are encoded using one embedding for each token, and similarity between queries and documents is measured by the MaxSim similarity measure. However, the improved recall of multi-vector retrieval comes at the expense of significantly increased latency. This necessitates designing efficient approximate nearest neighbor search (ANNS) algorithms for multi-vector search. In this work, we introduce LEMUR, a simple-yet-efficient framework for multi-vector similarity search. LEMUR consists of two consecutive problem reductions: We first formulate multi-vector similarity search as a supervised learning problem that can be solved using a one-hidden-layer neural network. Second, we reduce inference under this model to single-vector similarity search in its latent space, which enables the use of existing single-vector ANNS methods for speeding up retrieval. In addition to performance evaluation on ColBERTv2 embeddings, we evaluate LEMUR on embeddings generated by modern multi-vector text models and multi-vector visual document retrieval models. LEMUR is an order of magnitude faster than earlier multi-vector similarity search methods.
翻译:在信息检索应用中,由延迟交互模型(如ColBERT)生成的多向量表示相较于单向量表示能够实现更优的检索质量。在多向量检索系统中,查询与文档均通过每个词符对应的嵌入进行编码,其相似度通过MaxSim相似度度量进行计算。然而,多向量检索在提升召回率的同时,也带来了显著增加的检索延迟。这要求我们必须为多向量搜索设计高效的近似最近邻搜索算法。本研究提出LEMUR——一种简洁高效的多向量相似性搜索框架。LEMUR包含两个连续的问题归约步骤:首先将多向量相似性搜索形式化为可通过单隐藏层神经网络解决的监督学习问题;随后将该模型下的推理过程归约为其隐空间中的单向量相似性搜索,从而可利用现有单向量近似最近邻搜索方法加速检索。除在ColBERTv2嵌入上进行性能评估外,本研究还在现代多向量文本模型与多向量视觉文档检索模型生成的嵌入上评估了LEMUR。实验表明,LEMUR的检索速度较早期多向量相似性搜索方法提升了一个数量级。