Retrieval-Augmented Generation (RAG) demonstrates great value in alleviating outdated knowledge or hallucination by supplying LLMs with updated and relevant knowledge. However, there are still several difficulties for RAG in understanding complex multi-hop query and retrieving relevant documents, which require LLMs to perform reasoning and retrieve step by step. Inspired by human's reasoning process in which they gradually search for the required information, it is natural to ask whether the LLMs could notice the missing information in each reasoning step. In this work, we first experimentally verified the ability of LLMs to extract information as well as to know the missing. Based on the above discovery, we propose a Missing Information Guided Retrieve-Extraction-Solving paradigm (MIGRES), where we leverage the identification of missing information to generate a targeted query that steers the subsequent knowledge retrieval. Besides, we design a sentence-level re-ranking filtering approach to filter the irrelevant content out from document, along with the information extraction capability of LLMs to extract useful information from cleaned-up documents, which in turn to bolster the overall efficacy of RAG. Extensive experiments conducted on multiple public datasets reveal the superiority of the proposed MIGRES method, and analytical experiments demonstrate the effectiveness of our proposed modules.
翻译:检索增强生成通过为大型语言模型提供更新且相关的知识,在缓解知识过时或幻觉问题方面展现了重要价值。然而,理解复杂多跳查询并检索相关文档对RAG仍存在困难,这要求LLMs逐步进行推理和检索。受人类在推理过程中逐步搜索所需信息的启发,我们自然要探究LLMs是否能在每个推理步骤中察觉缺失信息。本研究首先通过实验验证了LLMs既能提取信息,也能识别信息缺失的能力。基于上述发现,我们提出了一种缺失信息引导的检索-提取-求解范式(MIGRES),通过识别缺失信息生成针对性查询,从而引导后续知识检索。此外,我们设计了句子级重排序过滤方法,从文档中筛除无关内容,并结合LLMs的信息提取能力从清理后的文档中提取有用信息,以增强RAG的整体效果。在多个公开数据集上的广泛实验表明,所提出的MIGRES方法具有优越性,而分析性实验则证明了各模块的有效性。