Neural information retrieval often adopts a retrieve-and-rerank framework: a bi-encoder network first retrieves K (e.g., 100) candidates that are then re-ranked using a more powerful cross-encoder model to rank the better candidates higher. The re-ranker generally produces better candidate scores than the retriever, but is limited to seeing only the top K retrieved candidates, thus providing no improvements in retrieval performance as measured by Recall@K. In this work, we leverage the re-ranker to also improve retrieval by providing inference-time relevance feedback to the retriever. Concretely, we update the retriever's query representation for a test instance using a lightweight inference-time distillation of the re-ranker's prediction for that instance. The distillation loss is designed to bring the retriever's candidate scores closer to those of the re-ranker. A second retrieval step is then performed with the updated query vector. We empirically show that our approach, which can serve arbitrary retrieve-and-rerank pipelines, significantly improves retrieval recall in multiple domains, languages, and modalities.
翻译:神经信息检索常采用“检索-重排序”框架:双编码器网络首先检索出K个(例如100个)候选结果,随后使用更强大的交叉编码器模型对这些候选结果进行重排序,以将更优的候选结果排在更高位置。重排序器通常能比检索器生成更优的候选得分,但其仅限于处理排名前K的候选结果,因此无法提升以Recall@K衡量的检索性能。本研究利用重排序器通过推理时对检索器提供相关反馈来改进检索性能。具体而言,我们采用轻量级的推理时蒸馏方法,将重排序器针对某一测试实例的预测结果传递至检索器,从而更新该实例的查询表示。蒸馏损失旨在使检索器的候选得分更接近重排序器的得分。随后利用更新后的查询向量执行第二次检索步骤。实验表明,我们的方法可适用于任意“检索-重排序”流水线,并在多个领域、语言及模态下显著提升检索召回率。