Multi-vector representations generated by late interaction models, such as ColBERT, enable superior retrieval quality compared to single-vector representations in information retrieval applications. In multi-vector retrieval systems, both queries and documents are encoded using one embedding per token, and similarity between queries and documents is measured by the MaxSim similarity measure. However, the improved quality of multi-vector retrieval comes at the expense of significantly increased search latency. In this work, we introduce LEMUR, a simple yet efficient framework for multi-vector similarity search. LEMUR consists of two consecutive problem reductions: First, we formulate multi-vector similarity search as a supervised learning problem that can be solved using a one-hidden-layer neural network. Second, we reduce inference under this model to single-vector similarity search in its latent space, enabling the use of existing single-vector search indexes to accelerate retrieval. LEMUR is an order of magnitude faster than prior multi-vector similarity search methods. Our code is available at https://github.com/ejaasaari/lemur
翻译:由晚期交互模型(如ColBERT)生成的多向量表示,在信息检索应用中相比单向量表示能够实现更优越的检索质量。在多向量检索系统中,查询和文档均通过每个词元对应一个嵌入向量进行编码,而查询与文档之间的相似度则通过MaxSim相似度度量进行评估。然而,多向量检索的质量提升是以显著增加搜索延迟为代价的。本文提出LEMUR,一种简洁而高效的多向量相似性搜索框架。LEMUR包含两个连续的问题约简:首先,我们将多向量相似性搜索形式化为一个可通过单隐藏层神经网络求解的监督学习问题;其次,我们将该模型下的推理过程约简为其潜在空间中的单向量相似性搜索,从而能够利用现有的单向量搜索索引加速检索。LEMUR的检索速度比先前的多向量相似性搜索方法快一个数量级。我们的代码开源在https://github.com/ejaasaari/lemur。