I^3 Retriever: Incorporating Implicit Interaction in Pre-trained Language Models for Passage Retrieval

Passage retrieval is a fundamental task in many information systems, such as web search and question answering, where both efficiency and effectiveness are critical concerns. In recent years, neural retrievers based on pre-trained language models (PLM), such as dual-encoders, have achieved huge success. Yet, studies have found that the performance of dual-encoders are often limited due to the neglecting of the interaction information between queries and candidate passages. Therefore, various interaction paradigms have been proposed to improve the performance of vanilla dual-encoders. Particularly, recent state-of-the-art methods often introduce late-interaction during the model inference process. However, such late-interaction based methods usually bring extensive computation and storage cost on large corpus. Despite their effectiveness, the concern of efficiency and space footprint is still an important factor that limits the application of interaction-based neural retrieval models. To tackle this issue, we incorporate implicit interaction into dual-encoders, and propose I^3 retriever. In particular, our implicit interaction paradigm leverages generated pseudo-queries to simulate query-passage interaction, which jointly optimizes with query and passage encoders in an end-to-end manner. It can be fully pre-computed and cached, and its inference process only involves simple dot product operation of the query vector and passage vector, which makes it as efficient as the vanilla dual encoders. We conduct comprehensive experiments on MSMARCO and TREC2019 Deep Learning Datasets, demonstrating the I^3 retriever's superiority in terms of both effectiveness and efficiency. Moreover, the proposed implicit interaction is compatible with special pre-training and knowledge distillation for passage retrieval, which brings a new state-of-the-art performance.

翻译：段落检索是许多信息系统（如网络搜索和问答系统）中的基础任务，其中效率和效果均为关键考量。近年来，基于预训练语言模型（PLM）的神经检索器（例如双编码器）取得了巨大成功。然而，研究表明，双编码器的性能常因忽视查询与候选段落间的交互信息而受限。为此，研究者提出了多种交互范式以改进基础双编码器的性能。特别是，当前最先进的方法常通过在模型推理过程中引入后期交互（late-interaction）来提升效果。但这类基于后期交互的方法通常会在大规模语料库上引入巨大的计算与存储开销。尽管效果显著，但效率与空间占用问题仍是限制交互式神经检索模型应用的重要因素。针对这一挑战，我们将隐式交互融入双编码器，并提出I³检索器。具体而言，我们的隐式交互范式利用生成的伪查询（pseudo-queries）模拟查询-段落交互，并与查询编码器和段落编码器以端到端方式联合优化。该交互模式可完全预计算并缓存，其推理过程仅涉及查询向量与段落向量的简单点积运算，因此效率与基础双编码器相当。我们在MSMARCO和TREC2019深度学习数据集上进行了全面实验，结果显示I³检索器在效果与效率上均具有优越性。此外，所提出的隐式交互与段落检索的专用预训练及知识蒸馏技术兼容，从而带来了新的最先进性能。