Graph-based retrieval-augmented generation (Graph-based RAG) has demonstrated significant potential in enhancing Large Language Models (LLMs) with structured knowledge. However, existing methods face three critical challenges: Inaccurate Graph Construction, caused by LLM hallucination; Poor Reasoning Ability, caused by failing to generate explicit reasons telling LLM why certain chunks were selected; and Inadequate Answering, which only partially answers the query due to the inadequate LLM reasoning, making their performance lag behind NaiveRAG on certain tasks. To address these issues, we propose AGRAG, an advanced graph-based retrieval-augmented generation framework. When constructing the graph, AGRAG substitutes the widely used LLM entity extraction method with a statistics-based method, avoiding hallucination and error propagation. During retrieval, AGRAG formulates the graph reasoning procedure as the Minimum Cost Maximum Influence (MCMI) subgraph generation problem, where we try to include more nodes with high influence score, but with less involving edge cost, to make the generated reasoning paths more comprehensive. We prove this problem to be NP-hard, and propose a greedy algorithm to solve it. The MCMI subgraph generated can serve as explicit reasoning paths to tell LLM why certain chunks were retrieved, thereby making the LLM better focus on the query-related part contents of the chunks, reducing the impact of noise, and improving AGRAG's reasoning ability. Furthermore, compared with the simple tree-structured reasoning paths, our MCMI subgraph can allow more complex graph structures, such as cycles, and improve the comprehensiveness of the generated reasoning paths. The code and prompt of AGRAG are released at: https://github.com/Wyb0627/AGRAG.
翻译:基于图的检索增强生成(Graph-based RAG)在为大语言模型(LLMs)注入结构化知识方面展现出巨大潜力。然而,现有方法面临三个关键挑战:由LLM幻觉导致的图构建不准确;因未能生成明确理由说明为何选择特定文本片段而导致的推理能力不足;以及由于LLM推理不充分致使答案仅部分回应查询的应答缺陷,导致其在某些任务上的表现甚至落后于NaiveRAG。为解决这些问题,我们提出了AGRAG——一种先进的图检索增强生成框架。在图构建阶段,AGRAG采用基于统计的方法替代广泛使用的LLM实体抽取方法,从而避免幻觉与错误传播。在检索过程中,AGRAG将图推理过程建模为最小成本最大影响力(MCMI)子图生成问题,其目标是在控制边成本最小化的前提下,尽可能纳入更多具有高影响力得分的节点,从而使生成的推理路径更全面。我们证明了该问题是NP难问题,并提出一种贪心算法进行求解。生成的MCMI子图可作为显式推理路径,向LLM阐明特定文本片段被检索的原因,进而使LLM更聚焦于文本片段中与查询相关的核心内容,降低噪声干扰并提升AGRAG的推理能力。此外,与简单的树状推理路径相比,我们的MCMI子图支持更复杂的图结构(如环状结构),从而提升生成推理路径的完备性。AGRAG的代码与提示模板已发布于:https://github.com/Wyb0627/AGRAG。