Information retrieval (IR) plays a crucial role in locating relevant resources from vast amounts of data, and its applications have evolved from traditional knowledge bases to modern retrieval models (RMs). The emergence of large language models (LLMs) has further revolutionized the IR field by enabling users to interact with search systems in natural languages. In this paper, we explore the advantages and disadvantages of LLMs and RMs, highlighting their respective strengths in understanding user-issued queries and retrieving up-to-date information. To leverage the benefits of both paradigms while circumventing their limitations, we propose InteR, a novel framework that facilitates information refinement through synergy between RMs and LLMs. InteR allows RMs to expand knowledge in queries using LLM-generated knowledge collections and enables LLMs to enhance prompt formulation using retrieved documents. This iterative refinement process augments the inputs of RMs and LLMs, leading to more accurate retrieval. Experiments on large-scale retrieval benchmarks involving web search and low-resource retrieval tasks demonstrate that InteR achieves overall superior zero-shot retrieval performance compared to state-of-the-art methods, even those using relevance judgment. Source code is available at https://github.com/Cyril-JZ/InteR
翻译:信息检索(IR)在从海量数据中定位相关资源方面发挥着关键作用,其应用已从传统知识库发展到现代检索模型(RMs)。大语言模型(LLMs)的出现进一步革新了IR领域,使用户能够以自然语言与搜索系统进行交互。本文探讨了LLMs与RMs的优缺点,突出了它们在理解用户查询和检索最新信息方面的各自优势。为充分发挥两种范式的优势并规避其局限性,我们提出了InteR——一种通过RMs与LLMs协同实现信息精炼的新型框架。InteR允许RMs利用LLM生成的知识集合扩展查询知识,并使LLMs能够利用检索到的文档优化提示构建。这一迭代精炼过程增强了RMs和LLMs的输入质量,从而提升检索准确性。在涉及网络搜索和低资源检索任务的大规模检索基准实验表明,即使与使用相关性判断的最先进方法相比,InteR在零样本检索中仍能取得整体更优的性能。源代码已发布于https://github.com/Cyril-JZ/InteR