GraphMERT: Efficient and Scalable Distillation of Reliable Knowledge Graphs from Unstructured Data

from arxiv, Camera-ready version. Published in Transactions on Machine Learning Research (TMLR), 2026. Reviewed on OpenReview: https://openreview.net/forum?id=tnXSdDhvqc

Researchers have pursued neurosymbolic artificial intelligence (AI) applications for nearly three decades. A marriage of the neural and symbolic components can lead to rapid advancements in AI. Yet, the field has not realized this promise since most neurosymbolic AI frameworks fail to scale. In addition, the implicit representations and approximate reasoning of purely neural approaches limit interpretability and trust. Knowledge graphs (KGs), a gold-standard representation of explicit semantic knowledge, can address the symbolic side of the problem. However, automatically deriving reliable KGs from text corpora remains an open problem. We address these challenges by introducing GraphMERT, a tiny graphical encoder-only model that distills high-quality KGs from unstructured text corpora and its own internal representations. GraphMERT and its equivalent KG form a modular neurosymbolic stack: neural learning of abstractions; symbolic KGs for verifiable reasoning. GraphMERT + KG is the first efficient and scalable neurosymbolic model to achieve state-of-the-art benchmark accuracy along with superior symbolic representations relative to baselines. Concretely, we target reliable domain-specific KGs that are both (1) factual (with provenance) and (2) valid (ontology-consistent relations with domain-appropriate semantics). When a large language model (LLM), e.g., Qwen3-32B, generates domain-specific KGs, it falls short on reliability due to prompt sensitivity, shallow domain expertise, and hallucinated relations. On text obtained from PubMed papers on diabetes, our 80M-parameter GraphMERT yields a KG with a 69.8% FActScore; a 32B-parameter baseline LLM yields a KG that achieves only 40.2% FActScore. The GraphMERT KG also attains a higher ValidityScore of 68.8%, versus 43.0% for the LLM baseline.

翻译：研究人员对神经符号人工智能应用的研究已持续近三十年。神经组件与符号组件的结合有望推动人工智能的快速发展。然而，由于大多数神经符号人工智能框架难以扩展，该领域尚未实现这一愿景。此外，纯神经方法的隐式表示与近似推理限制了可解释性与可信度。知识图谱作为显式语义知识的黄金标准表示，能够解决该问题的符号层面。然而，从文本语料库自动推导可靠的知识图谱仍是一个开放性问题。我们通过提出GraphMERT来解决这些挑战——这是一个微型图编码器模型，能够从非结构化文本语料库及其内部表示中蒸馏出高质量知识图谱。GraphMERT及其等价知识图谱构成模块化的神经符号堆栈：通过神经网络学习抽象表示；利用符号化知识图谱进行可验证推理。GraphMERT+知识图谱是首个高效可扩展的神经符号模型，在实现最先进基准精度的同时，其符号表示质量显著优于基线方法。具体而言，我们致力于构建可靠的领域特定知识图谱，其具备双重特性：（1）事实性（含溯源信息）；（2）有效性（符合本体论的关系结构与领域适配的语义表达）。当大型语言模型（如Qwen3-32B）生成领域特定知识图谱时，由于提示敏感性、浅层领域专业知识及关系幻觉等问题，其可靠性存在不足。在基于PubMed糖尿病论文文本的实验中，参数量为8000万的GraphMERT生成的知识图谱获得69.8%的FActScore；而参数量达320亿的基线大型语言模型生成的知识图谱仅获得40.2%的FActScore。GraphMERT知识图谱同时获得68.8%的ValidityScore，显著高于基线大型语言模型的43.0%。