The bi-encoder architecture provides a framework for understanding machine-learned retrieval models based on dense and sparse vector representations. Although these representations capture parametric realizations of the same underlying conceptual framework, their respective implementations of top-$k$ similarity search require the coordination of different software components (e.g., inverted indexes, HNSW indexes, and toolkits for neural inference), often knitted together in complex architectures. In this work, we ask the following question: What's the simplest design, in terms of requiring the fewest changes to existing infrastructure, that can support end-to-end retrieval with modern dense and sparse representations? The answer appears to be that Lucene is sufficient, as we demonstrate in Anserini, a toolkit for reproducible information retrieval research. That is, effective retrieval with modern single-vector neural models can be efficiently performed directly in Java on the CPU. We examine the implications of this design for information retrieval researchers pushing the state of the art as well as for software engineers building production search systems.
翻译:双编码器架构为理解基于稠密和稀疏向量表示的机器学习检索模型提供了一个框架。尽管这些表示捕捉了同一基本概念框架的参数化实现,但它们各自的top-$k$相似度搜索实现需要协调不同的软件组件(例如倒排索引、HNSW索引和神经推理工具包),这些组件通常被编织在复杂架构中。在这项工作中,我们提出以下问题:在需要对现有基础设施进行最少更改的前提下,支持现代稠密和稀疏表示的端到端检索的最简单设计方案是什么?答案似乎是Lucene已足够,正如我们在可重复信息检索研究工具包Anserini中所展示的那样。也就是说,使用现代单向量神经网络模型的有效检索可以在Java中直接在CPU上高效执行。我们探讨了这种设计对推动前沿研究的信息检索研究人员以及构建生产搜索系统的软件工程师的影响。