I^3 Retriever: Incorporating Implicit Interaction in Pre-trained Language Models for Passage Retrieval

Passage retrieval is a fundamental task in many information systems, such as web search and question answering, where both efficiency and effectiveness are critical concerns. In recent years, neural retrievers based on pre-trained language models (PLM), such as dual-encoders, have achieved huge success. Yet, studies have found that the performance of dual-encoders are often limited due to the neglecting of the interaction information between queries and candidate passages. Therefore, various interaction paradigms have been proposed to improve the performance of vanilla dual-encoders. Particularly, recent state-of-the-art methods often introduce late-interaction during the model inference process. However, such late-interaction based methods usually bring extensive computation and storage cost on large corpus. Despite their effectiveness, the concern of efficiency and space footprint is still an important factor that limits the application of interaction-based neural retrieval models. To tackle this issue, we incorporate implicit interaction into dual-encoders, and propose I^3 retriever. In particular, our implicit interaction paradigm leverages generated pseudo-queries to simulate query-passage interaction, which jointly optimizes with query and passage encoders in an end-to-end manner. It can be fully pre-computed and cached, and its inference process only involves simple dot product operation of the query vector and passage vector, which makes it as efficient as the vanilla dual encoders. We conduct comprehensive experiments on MSMARCO and TREC2019 Deep Learning Datasets, demonstrating the I^3 retriever's superiority in terms of both effectiveness and efficiency. Moreover, the proposed implicit interaction is compatible with special pre-training and knowledge distillation for passage retrieval, which brings a new state-of-the-art performance.

翻译：段落检索是众多信息系统（如网络搜索和问答系统）中的基础任务，效率和效果均为关键考量。近年来，基于预训练语言模型（PLM）的神经检索器（如双编码器）取得了巨大成功。然而，研究表明双编码器的性能常因忽略查询与候选段落之间的交互信息而受限。为此，学界提出了多种交互范式以改进基础双编码器的性能。特别是最新的先进方法常在模型推理过程中引入后期交互。然而，此类基于后期交互的方法通常在大规模语料库上带来大量计算与存储开销。尽管效果显著，但效率与空间占用量仍是限制交互式神经检索模型应用的重要因素。针对这一问题，我们将隐式交互融入双编码器，提出I³检索器。具体而言，我们的隐式交互范式利用生成的伪查询模拟查询-段落交互，并以端到端方式与查询编码器、段落编码器联合优化。该范式可完全预计算和缓存，推理过程仅需查询向量与段落向量的简单点积运算，因此效率与基础双编码器相当。我们在MSMARCO和TREC2019深度学习数据集上进行了广泛实验，证明I³检索器在效果和效率上均具优越性。此外，所提出的隐式交互与段落检索的专用预训练及知识蒸馏兼容，实现了全新的最佳性能。