Retrieval-Augmented Generation for AI-Generated Content: A Survey

The development of Artificial Intelligence Generated Content (AIGC) has been facilitated by advancements in model algorithms, scalable foundation model architectures, and the availability of ample high-quality datasets. While AIGC has achieved remarkable performance, it still faces challenges, such as the difficulty of maintaining up-to-date and long-tail knowledge, the risk of data leakage, and the high costs associated with training and inference. Retrieval-Augmented Generation (RAG) has recently emerged as a paradigm to address such challenges. In particular, RAG introduces the information retrieval process, which enhances AIGC results by retrieving relevant objects from available data stores, leading to greater accuracy and robustness. In this paper, we comprehensively review existing efforts that integrate RAG technique into AIGC scenarios. We first classify RAG foundations according to how the retriever augments the generator. We distill the fundamental abstractions of the augmentation methodologies for various retrievers and generators. This unified perspective encompasses all RAG scenarios, illuminating advancements and pivotal technologies that help with potential future progress. We also summarize additional enhancements methods for RAG, facilitating effective engineering and implementation of RAG systems. Then from another view, we survey on practical applications of RAG across different modalities and tasks, offering valuable references for researchers and practitioners. Furthermore, we introduce the benchmarks for RAG, discuss the limitations of current RAG systems, and suggest potential directions for future research. Project: https://github.com/hymie122/RAG-Survey

翻译：人工智能生成内容（AIGC）的发展得益于模型算法的进步、可扩展的基础模型架构以及充足高质量数据集的可用性。尽管AIGC已取得显著成效，但仍面临挑战，例如难以保持最新知识和长尾知识、数据泄露风险以及训练与推理的高昂成本。检索增强生成（RAG）近期作为一种应对此类挑战的范式出现。具体而言，RAG引入信息检索过程，通过从可用数据存储中检索相关对象来增强AIGC结果，从而提升准确性与鲁棒性。本文全面梳理了将RAG技术融入AIGC场景的现有工作。我们首先根据检索器对生成器的增强方式对RAG基础进行分类，并提炼各类检索器与生成器的核心增强抽象方法。这一统一视角涵盖了所有RAG场景，揭示了有助于未来进展的关键技术与演进趋势。我们还总结了RAG的附加增强方法，为RAG系统的有效工程化与实现提供支持。另一方面，我们从不同模态与任务角度调研了RAG的实际应用，为研究人员和从业者提供有价值的参考。此外，我们介绍了RAG的基准测试，讨论了当前RAG系统的局限性，并提出了未来研究的潜在方向。项目地址：https://github.com/hymie122/RAG-Survey