Large language models (LLMs) excel at complex reasoning but remain limited by static and incomplete parametric knowledge. Retrieval-augmented generation (RAG) mitigates this by incorporating external knowledge, yet existing RAGs struggle with knowledge-intensive tasks due to fragmented information and weak modeling of knowledge structure. Graphs offer a natural way to model relationships within knowledge, but LLMs are inherently unstructured and cannot effectively reason over graph-structured data. Recent graph-enhanced RAG (GraphRAG) attempts to bridge this gap by constructing tailored graphs and enabling LLMs to reason on them. However, these methods often depend on ad-hoc graph designs, heuristic search, or costly agent pipelines, which hinder scalability and generalization. To address these challenges, we present G-reasoner, a unified framework that integrates graph and language foundation models for scalable reasoning over diverse graph-structured knowledge. Central to our approach is QuadGraph, a standardized four-layer abstraction that unifies heterogeneous knowledge sources into a common graph representation. Building on this, we introduce a 34M-parameter graph foundation model (GFM) that jointly captures graph topology and textual semantics, and is integrated with LLMs to enhance reasoning in downstream applications. To ensure scalability and efficiency, mixed-precision training and distributed message-passing are implemented to scale GFM with more GPUs. Extensive experiments on six benchmarks show that G-reasoner consistently outperforms state-of-the-art baselines, significantly enhances LLM reasoning, and achieves strong efficiency and cross-graph generalization.
翻译:大型语言模型(LLMs)在复杂推理任务上表现出色,但仍受限于静态且不完整的参数化知识。检索增强生成(RAG)通过引入外部知识来缓解这一问题,然而现有的RAG方法由于信息碎片化以及对知识结构建模能力不足,在处理知识密集型任务时仍面临困难。图结构为知识内部关系建模提供了一种自然的方式,但LLMs本质上是非结构化的,无法有效地对图结构数据进行推理。近期的图增强RAG(GraphRAG)尝试通过构建定制化图并让LLMs在其上进行推理来弥合这一差距。然而,这些方法通常依赖于临时性的图设计、启发式搜索或成本高昂的智能体流程,这阻碍了其可扩展性与泛化能力。为应对这些挑战,我们提出了G-reasoner,一个统一的框架,它整合了图与语言基础模型,旨在实现对多样化图结构知识的可扩展推理。我们方法的核心是QuadGraph,一种标准化的四层抽象,它将异构知识源统一为通用的图表示。在此基础上,我们引入了一个拥有3400万参数的图基础模型(GFM),该模型联合捕获图拓扑结构与文本语义,并与LLMs集成以增强下游应用中的推理能力。为确保可扩展性与效率,我们实施了混合精度训练与分布式消息传递,以利用更多GPU扩展GFM。在六个基准测试上进行的大量实验表明,G-reasoner始终优于最先进的基线方法,显著提升了LLM的推理能力,并实现了强大的效率与跨图泛化性能。