Fine-tuning pre-trained language models (PLMs) has recently shown a potential to improve knowledge graph completion (KGC). However, most PLM-based methods encode only textual information, neglecting various topological structures of knowledge graphs (KGs). In this paper, we empirically validate the significant relations between the structural properties of KGs and the performance of the PLM-based methods. To leverage the structural knowledge, we propose a Subgraph-Aware Training framework for KGC (SATKGC) that combines (i) subgraph-aware mini-batching to encourage hard negative sampling, and (ii) a new contrastive learning method to focus more on harder entities and harder negative triples in terms of the structural properties. To the best of our knowledge, this is the first study to comprehensively incorporate the structural inductive bias of the subgraphs into fine-tuning PLMs. Extensive experiments on four KGC benchmarks demonstrate the superiority of SATKGC. Our code is available.
翻译:微调预训练语言模型(PLM)近期在提升知识图谱补全(KGC)方面展现出潜力。然而,大多数基于PLM的方法仅编码文本信息,忽略了知识图谱(KG)的多种拓扑结构。本文通过实证验证了KG的结构特性与基于PLM的方法性能之间的显著关联。为利用结构知识,我们提出了一种用于KGC的子图感知训练框架(SATKGC),该框架结合了:(i)鼓励困难负采样的子图感知小批量构建方法,以及(ii)一种新的对比学习方法,以在结构特性层面更聚焦于困难实体和困难负三元组。据我们所知,这是首次系统地将子图的结构归纳偏置融入PLM微调的研究。在四个KGC基准数据集上的大量实验证明了SATKGC的优越性。我们的代码已开源。