Advancements in model algorithms, the growth of foundational models, and access to high-quality datasets have propelled the evolution of Artificial Intelligence Generated Content (AIGC). Despite its notable successes, AIGC still faces hurdles such as updating knowledge, handling long-tail data, mitigating data leakage, and managing high training and inference costs. Retrieval-Augmented Generation (RAG) has recently emerged as a paradigm to address such challenges. In particular, RAG introduces the information retrieval process, which enhances the generation process by retrieving relevant objects from available data stores, leading to higher accuracy and better robustness. In this paper, we comprehensively review existing efforts that integrate RAG technique into AIGC scenarios. We first classify RAG foundations according to how the retriever augments the generator, distilling the fundamental abstractions of the augmentation methodologies for various retrievers and generators. This unified perspective encompasses all RAG scenarios, illuminating advancements and pivotal technologies that help with potential future progress. We also summarize additional enhancements methods for RAG, facilitating effective engineering and implementation of RAG systems. Then from another view, we survey on practical applications of RAG across different modalities and tasks, offering valuable references for researchers and practitioners. Furthermore, we introduce the benchmarks for RAG, discuss the limitations of current RAG systems, and suggest potential directions for future research. Github: https://github.com/PKU-DAIR/RAG-Survey.
翻译:模型算法的进步、基础模型的增长以及高质量数据集的获取,共同推动了人工智能生成内容(AIGC)的发展。尽管取得了显著成功,AIGC仍面临知识更新、处理长尾数据、缓解数据泄露以及管理高昂训练与推理成本等挑战。检索增强生成(RAG)作为一种应对此类挑战的范式,近期应运而生。具体而言,RAG引入了信息检索过程,通过从可用数据存储中检索相关对象来增强生成过程,从而实现更高的准确性和更好的鲁棒性。本文全面回顾了将RAG技术集成到AIGC场景中的现有工作。我们首先根据检索器如何增强生成器对RAG基础进行分类,提炼出各类检索器与生成器增强方法的基本抽象。这一统一视角涵盖了所有RAG场景,阐明了有助于未来潜在进展的技术进步与关键技术。我们还总结了RAG的额外增强方法,以促进RAG系统的有效工程化与实施。随后,我们从另一视角综述了RAG在不同模态与任务中的实际应用,为研究人员和实践者提供了有价值的参考。此外,我们介绍了RAG的基准测试,讨论了当前RAG系统的局限性,并提出了未来研究的潜在方向。Github: https://github.com/PKU-DAIR/RAG-Survey。