Information retrieval (IR) plays a crucial role in locating relevant resources from vast amounts of data, and its applications have evolved from traditional knowledge bases to modern search engines (SEs). The emergence of large language models (LLMs) has further revolutionized the IR field by enabling users to interact with search systems in natural language. In this paper, we explore the advantages and disadvantages of LLMs and SEs, highlighting their respective strengths in understanding user-issued queries and retrieving up-to-date information. To leverage the benefits of both paradigms while circumventing their limitations, we propose InteR, a novel framework that facilitates knowledge refinement through interaction between SEs and LLMs. InteR allows SEs to expand knowledge in queries using LLM-generated knowledge collections and enables LLMs to enhance prompt formulation using SE-retrieved documents. This iterative refinement process augments the inputs of SEs and LLMs, leading to more accurate retrieval. Experiments on large-scale retrieval benchmarks involving web search and low-resource retrieval tasks demonstrate that InteR achieves overall superior zero-shot retrieval performance compared to state-of-the-art methods, even those using relevance judgment. Source code is available at https://github.com/Cyril-JZ/InteR
翻译:信息检索(IR)在从海量数据中定位相关资源方面发挥着关键作用,其应用已从传统知识库演变至现代搜索引擎(SEs)。大语言模型(LLMs)的出现进一步革新了信息检索领域,使用户能够以自然语言与搜索系统交互。本文探讨了大语言模型与搜索引擎各自的优势与局限,重点分析了二者在理解用户查询意图和检索最新信息方面的各自特长。为同时利用两种范式的优势并规避其缺陷,我们提出InteR——一种通过搜索引擎与大语言模型交互实现知识精炼的新型框架。InteR允许搜索引擎利用大语言模型生成的知识集合扩展查询信息,同时使大语言模型能够借助搜索引擎检索的文档优化提示构建。这一迭代精炼过程增强了搜索引擎和大语言模型的输入质量,从而提升检索准确性。在涵盖网络搜索和低资源检索任务的大规模检索基准上的实验表明,即使与使用相关性判断的最新方法相比,InteR在零样本检索任务中仍能取得全面优越的性能。源代码已开源在https://github.com/Cyril-JZ/InteR。