Graph retrieval-augmented generation (GraphRAG) has emerged as a powerful paradigm for enhancing large language models (LLMs) with external knowledge. It leverages graphs to model the hierarchical structure between specific concepts, enabling more coherent and effective knowledge retrieval for accurate reasoning.Despite its conceptual promise, recent studies report that GraphRAG frequently underperforms vanilla RAG on many real-world tasks. This raises a critical question: Is GraphRAG really effective, and in which scenarios do graph structures provide measurable benefits for RAG systems? To address this, we propose GraphRAG-Bench, a comprehensive benchmark designed to evaluate GraphRAG models onboth hierarchical knowledge retrieval and deep contextual reasoning. GraphRAG-Bench features a comprehensive dataset with tasks of increasing difficulty, coveringfact retrieval, complex reasoning, contextual summarization, and creative generation, and a systematic evaluation across the entire pipeline, from graph constructionand knowledge retrieval to final generation. Leveraging this novel benchmark, we systematically investigate the conditions when GraphRAG surpasses traditional RAG and the underlying reasons for its success, offering guidelines for its practical application. All related resources and analyses are collected for the community at https://github.com/GraphRAG-Bench/GraphRAG-Benchmark.
翻译:图检索增强生成(GraphRAG)已成为利用外部知识增强大语言模型(LLMs)的重要范式。该方法通过图结构建模特定概念间的层次关系,从而为精确推理提供更连贯有效的知识检索机制。尽管其理念具有前景,近期研究表明GraphRAG在众多实际任务中常表现不及基础RAG系统。这引出了一个关键问题:GraphRAG是否真正有效?在何种场景下图结构能为RAG系统带来可量化的优势?为此,我们提出GraphRAG-Bench——一个用于评估图检索增强生成模型的综合性基准测试框架。该基准包含以下核心组件:1)涵盖多难度层级的综合数据集,涉及事实检索、复杂推理、语境化摘要与创造性生成等任务类型;2)覆盖全流程的系统化评估体系,从图构建与知识检索直至最终生成阶段。基于这一创新基准,我们系统探究了GraphRAG超越传统RAG的适用条件及其成功的内在机理,并为实际应用提供指导原则。所有相关资源与分析已通过https://github.com/GraphRAG-Bench/GraphRAG-Benchmark 开源共享。