Knowledge graph embedding models (KGEMs) are used for various tasks related to knowledge graphs (KGs), including link prediction. They are trained with loss functions that are computed considering a batch of scored triples and their corresponding labels. Traditional approaches consider the label of a triple to be either true or false. However, recent works suggest that all negative triples should not be valued equally. In line with this recent assumption, we posit that negative triples that are semantically valid w.r.t. domain and range constraints might be high-quality negative triples. As such, loss functions should treat them differently from semantically invalid negative ones. To this aim, we propose semantic-driven versions for the three main loss functions for link prediction. In an extensive and controlled experimental setting, we show that the proposed loss functions systematically provide satisfying results on three public benchmark KGs underpinned with different schemas, which demonstrates both the generality and superiority of our proposed approach. In fact, the proposed loss functions do (1) lead to better MRR and Hits@10 values, (2) drive KGEMs towards better semantic awareness as measured by the Sem@K metric. This highlights that semantic information globally improves KGEMs, and thus should be incorporated into loss functions. Domains and ranges of relations being largely available in schema-defined KGs, this makes our approach both beneficial and widely usable in practice.
翻译:知识图谱嵌入模型(KGEMs)被广泛应用于知识图谱(KGs)相关任务,包括链接预测。其训练过程中通过计算一批得分三元组及其对应标签的损失函数来实现。传统方法将三元组的标签仅区分为真或假。然而,近期研究表明,所有负三元组不应被赋予同等价值。基于这一新假设,我们认为符合域与范围约束的语义有效负三元组可能是高质量负三元组。因此,损失函数应将其与语义无效的负三元组区别对待。为此,我们针对链接预测的三种主流损失函数提出了语义驱动版本。在广泛且受控的实验条件下,我们证明所提出的损失函数在三个基于不同模式(schema)的公开基准知识图谱上均能持续获得满意结果,这体现了我们方法的普适性与优越性。具体而言,所提损失函数能够:(1)提升MRR与Hits@10指标值;(2)通过Sem@K指标衡量,驱动KGEMs实现更好的语义感知能力。这表明语义信息总体上能改进KGEMs,因此应当被整合到损失函数中。由于模式化知识图谱中广泛存在关系的域与范围定义,我们的方法在实际应用中兼具效益性与通用性。