Large language models (LLMs) struggle with the factual error during inference due to the lack of sufficient training data and the most updated knowledge, leading to the hallucination problem. Retrieval-Augmented Generation (RAG) has gained attention as a promising solution to address the limitation of LLMs, by retrieving relevant information from external source to generate more accurate answers to the questions. Given the pervasive presence of structured knowledge in the external source, considerable strides in RAG have been made to employ the techniques related to graphs and achieve more complex reasoning based on the topological information between knowledge entities. However, there is currently neither unified review examining the diverse roles of graphs in RAG, nor a comprehensive resource to help researchers navigate and contribute to this evolving field. This survey offers a novel perspective on the functionality of graphs within RAG and their impact on enhancing performance across a wide range of graph-structured data. It provides a detailed breakdown of the roles that graphs play in RAG, covering database construction, algorithms, pipelines, and tasks. Finally, it identifies current challenges and outline future research directions, aiming to inspire further developments in this field. Our graph-centered analysis highlights the commonalities and differences in existing methods, setting the stage for future researchers in areas such as graph learning, database systems, and natural language processing.
翻译:大型语言模型(LLMs)由于缺乏充足的训练数据和最新知识,在推理过程中常出现事实性错误,导致幻觉问题。检索增强生成(RAG)作为一种有前景的解决方案受到关注,它通过从外部源检索相关信息来生成更准确的问题答案,从而应对LLMs的局限性。鉴于外部源中普遍存在结构化知识,RAG领域已取得显著进展,通过运用图相关技术并基于知识实体间的拓扑信息实现更复杂的推理。然而,目前既缺乏统一综述来审视图在RAG中的多样化作用,也缺少全面的资源帮助研究者在这一演进领域中进行导航与贡献。本综述从新颖视角探讨了图在RAG中的功能及其对各类图结构数据性能提升的影响,详细剖析了图在数据库构建、算法、流程和任务中扮演的角色。最后,本文指出当前挑战并勾勒未来研究方向,旨在推动该领域的进一步发展。我们以图为中心的分析突出了现有方法的共性与差异,为图学习、数据库系统和自然语言处理等领域的未来研究者奠定基础。