Large language models (LLMs) inevitably exhibit hallucinations since the accuracy of generated texts cannot be secured solely by the parametric knowledge they encapsulate. Although retrieval-augmented generation (RAG) is a practicable complement to LLMs, it relies heavily on the relevance of retrieved documents, raising concerns about how the model behaves if retrieval goes wrong. To this end, we propose the Corrective Retrieval Augmented Generation (CRAG) to improve the robustness of generation. Specifically, a lightweight retrieval evaluator is designed to assess the overall quality of retrieved documents for a query, returning a confidence degree based on which different knowledge retrieval actions can be triggered. Since retrieval from static and limited corpora can only return sub-optimal documents, large-scale web searches are utilized as an extension for augmenting the retrieval results. Besides, a decompose-then-recompose algorithm is designed for retrieved documents to selectively focus on key information and filter out irrelevant information in them. CRAG is plug-and-play and can be seamlessly coupled with various RAG-based approaches. Experiments on four datasets covering short- and long-form generation tasks show that CRAG can significantly improve the performance of RAG-based approaches.
翻译:大型语言模型(LLMs)不可避免地会产生幻觉,因为仅凭其封装在参数内的知识无法保障生成文本的准确性。尽管检索增强生成(RAG)是对LLMs的一种实用补充,但它严重依赖所检索文档的相关性,这引发了当检索失效时模型行为的担忧。为此,我们提出纠错式检索增强生成(CRAG)以提高生成的鲁棒性。具体而言,我们设计了一个轻量级检索评估器,用于评估针对查询所检索文档的整体质量,并返回一个置信度分数,基于此可触发不同的知识检索动作。由于从静态有限语料库中检索仅能返回次优文档,我们利用大规模网络搜索作为扩展来增强检索结果。此外,针对检索到的文档,我们设计了一种"先分解后重组"算法,以选择性聚焦其中的关键信息并过滤无关内容。CRAG具有即插即用特性,可无缝集成到各类基于RAG的方法中。在涵盖短文本与长文本生成任务的四个数据集上的实验表明,CRAG能显著提升基于RAG方法的性能。