Automated knowledge graph (KG) construction is essential for navigating the rapidly expanding body of scientific literature. However, existing approaches struggle to recognize long multi-word entities, often fail to generalize across domains, and typically overlook the hierarchical nature of scientific knowledge. While general-purpose large language models (LLMs) offer adaptability, they are computationally expensive and yield inconsistent accuracy on specialized tasks. As a result, current KGs are shallow and inconsistent, limiting their utility for exploration and synthesis. We propose a two-stage framework for scalable, zero-shot scientific KG construction. The first stage, Z-NERD, introduces (i) Orthogonal Semantic Decomposition (OSD), which promotes domain-agnostic entity recognition by isolating semantic "turns" in text, and (ii) a Multi-Scale TCQK attention mechanism that captures coherent multi-word entities through n-gram-aware attention heads. The second stage, HGNet, performs relation extraction with hierarchy-aware message passing, explicitly modeling parent, child, and peer relations. To enforce global consistency, we introduce two complementary objectives: a Differentiable Hierarchy Loss to discourage cycles and shortcut edges, and a Continuum Abstraction Field (CAF) Loss that embeds abstraction levels along a learnable axis in Euclidean space. This is the first approach to formalize hierarchical abstraction as a continuous property within standard Euclidean embeddings, offering a simpler alternative to hyperbolic methods. We release SPHERE (https://github.com/basiralab/SPHERE), a multi-domain benchmark for hierarchical relation extraction. Our framework establishes a new state of the art on SciERC, SciER, and SPHERE, improving NER by 8.08% and RE by 5.99% on out-of-distribution tests. In zero-shot settings, gains reach 10.76% for NER and 26.2% for RE.
翻译:自动化知识图谱(KG)构建对于驾驭快速增长的科技文献至关重要。然而现有方法难以识别长跨度的多词实体,往往无法跨领域泛化,且通常忽略了科学知识的层级本质。尽管通用大语言模型(LLM)具有适应性,但在专业任务上计算成本高昂且精度不稳定,导致当前知识图谱浅层且不一致,限制了其在探索与综合中的实用性。我们提出面向可扩展零样本科学知识图谱构建的两阶段框架:第一阶段Z-NERD引入:(i) 正交语义分解(OSD),通过分离文本中的语义"转向"促进域无关实体识别;(ii) 多尺度TCQK注意力机制,利用n-gram感知注意力头捕获连贯的多词实体。第二阶段HGNet通过层级感知消息传递执行关系抽取,显式建模父代、子代与同级关系。为强调整体一致性,我们提出两个互补目标函数:可微层级损失函数(抑制回路与捷径边),以及连续抽象场(CAF)损失函数(在欧氏空间中将抽象层级沿可学习轴嵌入)。这是首个将层级抽象形式化为标准欧氏嵌入中连续属性的方法,为双曲方法提供了更简洁的替代方案。我们发布面向层级关系抽取的多领域基准SPHERE(https://github.com/basiralab/SPHERE)。本框架在SciERC、SciER与SPHERE上取得新最优结果,在分布外测试中NER提升8.08%,RE提升5.99%。零样本设置下,NER与RE分别提升10.76%与26.2%。