As a primary means of information acquisition, information retrieval (IR) systems, such as search engines, have integrated themselves into our daily lives. These systems also serve as components of dialogue, question-answering, and recommender systems. The trajectory of IR has evolved dynamically from its origins in term-based methods to its integration with advanced neural models. While the neural models excel at capturing complex contextual signals and semantic nuances, thereby reshaping the IR landscape, they still face challenges such as data scarcity, interpretability, and the generation of contextually plausible yet potentially inaccurate responses. This evolution requires a combination of both traditional methods (such as term-based sparse retrieval methods with rapid response) and modern neural architectures (such as language models with powerful language understanding capacity). Meanwhile, the emergence of large language models (LLMs), typified by ChatGPT and GPT-4, has revolutionized natural language processing due to their remarkable language understanding, generation, generalization, and reasoning abilities. Consequently, recent research has sought to leverage LLMs to improve IR systems. Given the rapid evolution of this research trajectory, it is necessary to consolidate existing methodologies and provide nuanced insights through a comprehensive overview. In this survey, we delve into the confluence of LLMs and IR systems, including crucial aspects such as query rewriters, retrievers, rerankers, and readers. Additionally, we explore promising directions within this expanding field.
翻译:作为信息获取的主要途径,信息检索(IR)系统(如搜索引擎)已融入我们的日常生活。这些系统还作为对话系统、问答系统和推荐系统的组成部分。IR的发展轨迹从基于术语的方法起源,到与先进神经模型的整合,始终在动态演进。尽管神经模型在捕捉复杂上下文信号和语义细微差别方面表现出色,从而重塑了IR格局,但它们仍面临数据稀缺、可解释性及生成上下文合理但可能不准确的响应等挑战。这种演进要求结合传统方法(如响应快速的基于术语的稀疏检索方法)与现代神经架构(如具备强大语言理解能力的语言模型)。与此同时,以ChatGPT和GPT-4为代表的大语言模型(LLMs)的涌现,因其卓越的语言理解、生成、泛化和推理能力,彻底改变了自然语言处理领域。因此,近期研究致力于利用LLMs改进IR系统。鉴于该研究轨迹的快速演进,有必要通过全面综述来整合现有方法论并提供细致见解。在本综述中,我们深入探讨LLMs与IR系统的交汇点,涵盖查询重写器、检索器、重排序器及阅读器等关键方面。此外,我们还探索了这一扩展领域中具有前景的研究方向。