Information retrieval (IR) plays a crucial role in locating relevant resources from vast amounts of data, and its applications have evolved from traditional knowledge bases to modern search engines (SEs). The emergence of large language models (LLMs) has further revolutionized the field by enabling users to interact with search systems in natural language. In this paper, we explore the advantages and disadvantages of LLMs and SEs, highlighting their respective strengths in understanding user-issued queries and retrieving up-to-date information. To leverage the benefits of both paradigms while circumventing their limitations, we propose InteR, a novel framework that facilitates knowledge refinement through interaction between SEs and LLMs. InteR allows SEs to refine knowledge in query using LLM-generated summaries and enables LLMs to enhance prompts using SE-retrieved documents. This iterative refinement process augments the inputs of SEs and LLMs, leading to more accurate retrieval. Experimental evaluations on two large-scale retrieval benchmarks demonstrate that InteR achieves superior zero-shot document retrieval performance compared to state-of-the-art methods, regardless of the use of relevance judgement.
翻译:信息检索(IR)在海量数据中定位相关资源方面发挥着关键作用,其应用已从传统知识库演进为现代搜索引擎(SEs)。大语言模型(LLMs)的出现进一步革新了这一领域,使用户能够以自然语言与搜索系统交互。本文探讨了LLMs与SEs的优缺点,重点阐明了它们在理解用户查询与检索最新信息方面的各自优势。为充分利用两种范式的优势并规避其局限性,我们提出了InteR——一种通过搜索引擎与大语言模型交互实现知识精炼的新型框架。InteR允许搜索引擎利用LLM生成的摘要精炼查询知识,同时使LLM能够借助搜索引擎检索的文档增强提示词。这一迭代精炼过程优化了搜索引擎与LLM的输入,从而实现了更精准的检索。在两个大规模检索基准上的实验评估表明,无论是否使用相关性判断,InteR在零样本文档检索性能上均优于现有最先进方法。