GraphRAG-R1: Graph Retrieval-Augmented Generation with Process-Constrained Reinforcement Learning

Graph Retrieval-Augmented Generation (GraphRAG) has shown great effectiveness in enhancing the reasoning abilities of LLMs by leveraging graph structures for knowledge representation and modeling complex real-world relationships. However, existing GraphRAG methods still face significant bottlenecks when handling complex problems that require multi-hop reasoning, as their query and retrieval phases are largely based on pre-defined heuristics and do not fully utilize the reasoning potentials of LLMs. To address this problem, we propose GraphRAG-R1, an adaptive GraphRAG framework by training LLMs with process-constrained outcome-based reinforcement learning (RL) to enhance the multi-hop reasoning ability. Our method can decompose complex problems, autonomously invoke retrieval tools to acquire necessary information, and perform effective reasoning. Specifically, we utilize a modified version of Group Relative Policy Optimization (GRPO) that supports rollout-with-thinking capability. Next, we design two process-constrained reward functions. To handle the shallow retrieval problem, we design a Progressive Retrieval Attenuation (PRA) reward to encourage essential retrievals. Then, to handle the over-thinking problem, we design Cost-Aware F1 (CAF) reward to balance the model performance with computational costs. We further design a phase-dependent training strategy, containing three training stages corresponding to cold start and these two rewards. Lastly, our method adopts a hybrid graph-textual retrieval to improve the reasoning capacity. Extensive experimental results demonstrate that GraphRAG-R1 boosts LLM capabilities in solving complex reasoning problems compared to state-of-the-art GraphRAG methods on both in-domain and out-of-domain datasets. Furthermore, our framework can be flexibly integrated with various existing retrieval methods, consistently delivering performance improvements.

翻译：图检索增强生成（GraphRAG）通过利用图结构进行知识表示和复杂现实关系建模，在增强大语言模型（LLM）的推理能力方面展现出显著效果。然而，现有GraphRAG方法在处理需要多跳推理的复杂问题时仍面临重大瓶颈，其查询与检索阶段主要基于预定义启发式规则，未能充分利用LLM的推理潜力。为解决此问题，我们提出GraphRAG-R1——一种通过基于过程约束的结果驱动强化学习（RL）训练LLM的自适应GraphRAG框架，以增强多跳推理能力。该方法能够分解复杂问题，自主调用检索工具获取必要信息，并进行有效推理。具体而言，我们采用支持“带思考的推演”能力的改进版组相对策略优化（GRPO）。接着，我们设计了两种过程约束的奖励函数：针对浅层检索问题，设计了渐进式检索衰减（PRA）奖励以激励必要检索；针对过度思考问题，设计了成本感知F1（CAF）奖励以平衡模型性能与计算成本。我们进一步设计了包含冷启动阶段及对应上述两种奖励的三个训练阶段的分阶段训练策略。最后，该方法采用图-文本混合检索机制以提升推理能力。大量实验结果表明，在领域内和领域外数据集上，GraphRAG-R1相比现有最先进的GraphRAG方法均能显著提升LLM解决复杂推理问题的能力。此外，本框架可灵活集成多种现有检索方法，并持续带来性能提升。