Knowledge graph embedding models (KGEMs) are used for various tasks related to knowledge graphs (KGs), including link prediction. They are trained with loss functions that are computed considering a batch of scored triples and their corresponding labels. Traditional approaches consider the label of a triple to be either true or false. However, recent works suggest that all negative triples should not be valued equally. In line with this commonly adopted assumption, we posit that semantically valid negative triples might be high-quality negative triples. As such, loss functions should treat them differently from semantically invalid negative ones. To this aim, we propose semantic-driven versions for the three mostly used loss functions for link prediction. In particular, we treat the scores of negative triples differently by injecting background knowledge about relation domains and ranges into the loss functions. In an extensive and controlled experimental setting, we show that the proposed loss functions systematically provide satisfying results on three public benchmark KGs underpinned with different schemas, which demonstrates both the generality and superiority of our proposed approach. In fact, the proposed loss functions do not only lead to better MRR and Hits@10 values, but also drive KGEMs towards better semantic awareness. This highlights that semantic information globally improves KGEMs, and thus should be incorporated into loss functions whenever such information is available.
翻译:知识图谱嵌入模型(KGEMs)被用于知识图谱(KGs)相关的各类任务,包括链接预测。它们通过损失函数进行训练,该损失函数基于一批带有评分的三元组及其对应标签计算得出。传统方法将三元组的标签视为真或假。然而,近期研究表明,所有负例三元组不应被同等看待。基于这一普遍假设,我们提出语义有效的负例三元组可能是高质量的负例三元组。因此,损失函数应对其与语义无效的负例三元组区别对待。为此,我们针对链接预测中最常用的三种损失函数提出了语义驱动版本。具体而言,我们通过将关系域和范围等背景知识注入损失函数,对负例三元组的评分进行差异化处理。在广泛且受控的实验环境中,我们证明了所提出的损失函数在三个基于不同模式的公开基准知识图谱上系统性地取得了令人满意的结果,这体现了我们方法的通用性和优越性。事实上,所提出的损失函数不仅提升了MRR和Hits@10值,还驱动KGEMs向更好的语义感知方向发展。这突显了语义信息能够整体提升KGEMs的性能,因此只要语义信息可用,就应将其纳入损失函数中。