Retrieval augmented generation (RAG) has enhanced large language models by enabling access to external knowledge, with graph-based RAG emerging as a powerful paradigm for structured retrieval and reasoning. However, existing graph-based methods often over-rely on surface-level node matching and lack explicit causal modeling, leading to unfaithful or spurious answers. Prior attempts to incorporate causality are typically limited to local or single-document contexts and also suffer from information isolation that arises from modular graph structures, which hinders scalability and cross-module causal reasoning. To address these challenges, we propose HugRAG, a framework that rethinks knowledge organization for graph-based RAG through causal gating across hierarchical modules. HugRAG explicitly models causal relationships to suppress spurious correlations while enabling scalable reasoning over large-scale knowledge graphs. Extensive experiments demonstrate that HugRAG consistently outperforms competitive graph-based RAG baselines across multiple datasets and evaluation metrics. Our work establishes a principled foundation for structured, scalable, and causally grounded RAG systems.
翻译:检索增强生成(RAG)通过使大型语言模型能够访问外部知识而增强了其能力,其中基于图的RAG已成为结构化检索与推理的强大范式。然而,现有的基于图的方法往往过度依赖表层节点匹配,且缺乏显式的因果建模,导致生成不忠实或虚假的答案。先前尝试融入因果性的方法通常局限于局部或单文档上下文,并且同样受困于模块化图结构所导致的信息隔离问题,这阻碍了系统的可扩展性与跨模块因果推理。为应对这些挑战,我们提出了HugRAG,一个通过跨层次化模块的因果门控机制来重新思考基于图的RAG知识组织的框架。HugRAG显式地对因果关系进行建模,以抑制虚假相关性,同时实现对大规模知识图谱的可扩展推理。大量实验表明,HugRAG在多个数据集和评估指标上均持续优于主流的基于图的RAG基线方法。我们的工作为结构化、可扩展且基于因果基础的RAG系统奠定了原则性基础。