Treat Different Negatives Differently: Enriching Loss Functions with Domain and Range Constraints for Link Prediction

Knowledge graph embedding models (KGEMs) are used for various tasks related to knowledge graphs (KGs), including link prediction. They are trained with loss functions that are computed considering a batch of scored triples and their corresponding labels. Traditional approaches consider the label of a triple to be either true or false. However, recent works suggest that all negative triples should not be valued equally. In line with this recent assumption, we posit that negative triples that are semantically valid w.r.t. domain and range constraints might be high-quality negative triples. As such, loss functions should treat them differently from semantically invalid negative ones. To this aim, we propose semantic-driven versions for the three main loss functions for link prediction. In an extensive and controlled experimental setting, we show that the proposed loss functions systematically provide satisfying results on three public benchmark KGs underpinned with different schemas, which demonstrates both the generality and superiority of our proposed approach. In fact, the proposed loss functions do (1) lead to better MRR and Hits@10 values, (2) drive KGEMs towards better semantic awareness as measured by the Sem@K metric. This highlights that semantic information globally improves KGEMs, and thus should be incorporated into loss functions. Domains and ranges of relations being largely available in schema-defined KGs, this makes our approach both beneficial and widely usable in practice.

翻译：知识图谱嵌入模型（KGEMs）被广泛应用于知识图谱（KGs）相关任务，包括链接预测。其训练过程中通过计算一批得分三元组及其对应标签的损失函数来实现。传统方法将三元组的标签仅区分为真或假。然而，近期研究表明，所有负三元组不应被赋予同等价值。基于这一新假设，我们认为符合域与范围约束的语义有效负三元组可能是高质量负三元组。因此，损失函数应将其与语义无效的负三元组区别对待。为此，我们针对链接预测的三种主流损失函数提出了语义驱动版本。在广泛且受控的实验条件下，我们证明所提出的损失函数在三个基于不同模式（schema）的公开基准知识图谱上均能持续获得满意结果，这体现了我们方法的普适性与优越性。具体而言，所提损失函数能够：（1）提升MRR与Hits@10指标值；（2）通过Sem@K指标衡量，驱动KGEMs实现更好的语义感知能力。这表明语义信息总体上能改进KGEMs，因此应当被整合到损失函数中。由于模式化知识图谱中广泛存在关系的域与范围定义，我们的方法在实际应用中兼具效益性与通用性。

相关内容

损失函数（机器学习）

关注 10

损失函数，在AI中亦称呼距离函数，度量函数。此处的距离代表的是抽象性的，代表真实数据与预测数据之间的误差。损失函数（loss function）是用来估量你模型的预测值f(x)与真实值Y的不一致程度，它是一个非负实值函数,通常使用L(Y, f(x))来表示，损失函数越小，模型的鲁棒性就越好。损失函数是经验风险函数的核心部分，也是结构风险函数重要组成部分。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日