Enhancing Knowledge Graph Embedding Models with Semantic-driven Loss Functions

Knowledge graph embedding models (KGEMs) are used for various tasks related to knowledge graphs (KGs), including link prediction. They are trained with loss functions that are computed considering a batch of scored triples and their corresponding labels. Traditional approaches consider the label of a triple to be either true or false. However, recent works suggest that all negative triples should not be valued equally. In line with this recent assumption, we posit that semantically valid negative triples might be high-quality negative triples. As such, loss functions should treat them differently from semantically invalid negative ones. To this aim, we propose semantic-driven versions for the three main loss functions for link prediction. In particular, we treat the scores of negative triples differently by injecting background knowledge about relation domains and ranges into the loss functions. In an extensive and controlled experimental setting, we show that the proposed loss functions systematically provide satisfying results on three public benchmark KGs underpinned with different schemas, which demonstrates both the generality and superiority of our proposed approach. In fact, the proposed loss functions do (1) lead to better MRR and Hits@$10$ values, (2) drive KGEMs towards better semantic awareness. This highlights that semantic information globally improves KGEMs, and thus should be incorporated into loss functions. Domains and ranges of relations being largely available in schema-defined KGs, this makes our approach both beneficial and widely usable in practice.

翻译：知识图谱嵌入模型（KGEMs）用于处理与知识图谱（KGs）相关的各类任务，包括链接预测。这些模型通过损失函数进行训练，损失函数基于一批带评分的三元组及其对应标签计算得出。传统方法将三元组的标签视为真或假两类。然而，近期研究表明，所有负三元组不应被同等评估。基于这一新近假设，我们认为语义有效的负三元组可能具有高质量的负样本特性。因此，损失函数应对其与语义无效的负三元组区别对待。为此，我们针对链接预测的三种主要损失函数分别提出语义驱动版本。具体而言，我们通过将关系域和值域的背景知识注入损失函数，对负三元组的评分进行差异化处理。在广泛且受控的实验设置中，我们证明所提出的损失函数在三种具有不同模式定义的公开基准知识图谱上持续取得满意结果，这既展现了所提方法的通用性又验证了其优越性。事实上，所提出的损失函数（1）能够提升MRR和Hits@10值，（2）推动KGEMs获得更好的语义感知能力。这凸显了语义信息能从根本上改善KGEMs性能，因此应被纳入损失函数。由于关系域和值域在模式定义的知识图谱中广泛可用，这使得我们的方法在实践中兼具实用性与普适性。

相关内容

损失函数（机器学习）

关注 10

损失函数，在AI中亦称呼距离函数，度量函数。此处的距离代表的是抽象性的，代表真实数据与预测数据之间的误差。损失函数（loss function）是用来估量你模型的预测值f(x)与真实值Y的不一致程度，它是一个非负实值函数,通常使用L(Y, f(x))来表示，损失函数越小，模型的鲁棒性就越好。损失函数是经验风险函数的核心部分，也是结构风险函数重要组成部分。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

因果图，Causal Graphs，52页ppt

专知会员服务

254+阅读 · 2020年4月19日