高阶知识表示在自主科学推理中的应用 (Higher-Order Knowledge Representations for Agentic Scientific Reasoning)

Scientific inquiry requires systems-level reasoning that integrates heterogeneous experimental data, cross-domain knowledge, and mechanistic evidence into coherent explanations. While Large Language Models (LLMs) offer inferential capabilities, they often depend on retrieval-augmented contexts that lack structural depth. Traditional Knowledge Graphs (KGs) attempt to bridge this gap, yet their pairwise constraints fail to capture the irreducible higher-order interactions that govern emergent physical behavior. To address this, we introduce a methodology for constructing hypergraph-based knowledge representations that faithfully encode multi-entity relationships. Applied to a corpus of ~1,100 manuscripts on biocomposite scaffolds, our framework constructs a global hypergraph of 161,172 nodes and 320,201 hyperedges, revealing a scale-free topology (power law exponent ~1.23) organized around highly connected conceptual hubs. This representation prevents the combinatorial explosion typical of pairwise expansions and explicitly preserves the co-occurrence context of scientific formulations. We further demonstrate that equipping agentic systems with hypergraph traversal tools, specifically using node-intersection constraints, enables them to bridge semantically distant concepts. By exploiting these higher-order pathways, the system successfully generates grounded mechanistic hypotheses for novel composite materials, such as linking cerium oxide to PCL scaffolds via chitosan intermediates. This work establishes a "teacherless" agentic reasoning system where hypergraph topology acts as a verifiable guardrail, accelerating scientific discovery by uncovering relationships obscured by traditional graph methods.

翻译：科学探究需要系统级推理，将异构实验数据、跨领域知识和机制性证据整合为连贯的解释。尽管大型语言模型（LLM）具备推理能力，但它们通常依赖于缺乏结构深度的检索增强上下文。传统知识图谱（KG）试图弥合这一差距，但其二元约束无法捕捉支配涌现物理行为的不可约高阶相互作用。为此，我们提出了一种构建基于超图的知识表示的方法，该方法能忠实编码多实体关系。将该框架应用于约1,100篇关于生物复合支架的文献语料库，我们构建了一个包含161,172个节点和320,201条超边的全局超图，揭示了围绕高度连接的概念枢纽组织的无标度拓扑结构（幂律指数约1.23）。这种表示方法避免了二元扩展中常见的组合爆炸，并明确保留了科学表述的共现上下文。我们进一步证明，为自主系统配备超图遍历工具（特别是使用节点交集约束）使其能够桥接语义上相距较远的概念。通过利用这些高阶路径，该系统成功地为新型复合材料生成了基于事实的机制性假设，例如通过壳聚糖中间体将氧化铈与PCL支架联系起来。这项工作建立了一个“无导师”的自主推理系统，其中超图拓扑结构充当可验证的防护栏，通过揭示传统图方法所掩盖的关系来加速科学发现。