Knowledge Graphs (KGs) are fundamental resources in knowledge-intensive tasks in NLP. Due to the limitation of manually creating KGs, KG Completion (KGC) has an important role in automatically completing KGs by scoring their links with KG Embedding (KGE). To handle many entities in training, KGE relies on Negative Sampling (NS) loss that can reduce the computational cost by sampling. Since the appearance frequencies for each link are at most one in KGs, sparsity is an essential and inevitable problem. The NS loss is no exception. As a solution, the NS loss in KGE relies on smoothing methods like Self-Adversarial Negative Sampling (SANS) and subsampling. However, it is uncertain what kind of smoothing method is suitable for this purpose due to the lack of theoretical understanding. This paper provides theoretical interpretations of the smoothing methods for the NS loss in KGE and induces a new NS loss, Triplet Adaptive Negative Sampling (TANS), that can cover the characteristics of the conventional smoothing methods. Experimental results of TransE, DistMult, ComplEx, RotatE, HAKE, and HousE on FB15k-237, WN18RR, and YAGO3-10 datasets and their sparser subsets show the soundness of our interpretation and performance improvement by our TANS.
翻译:知识图谱(KGs)是自然语言处理中知识密集型任务的基础资源。由于人工构建知识图谱存在局限性,知识图谱补全(KGC)通过知识图谱嵌入(KGE)对链接进行评分,在自动补全知识图谱方面发挥着重要作用。为处理训练中的大量实体,KGE依赖负采样(NS)损失函数,该函数可通过采样降低计算成本。由于知识图谱中每个链接的出现频率至多为一次,稀疏性是一个本质且不可避免的问题。NS损失函数亦不例外。作为解决方案,KGE中的NS损失函数依赖于平滑方法,如自对抗负采样(SANS)和子采样。然而,由于缺乏理论理解,尚不确定何种平滑方法适用于此目的。本文对KGE中NS损失函数的平滑方法提供了理论解释,并推导出一种新的NS损失函数——三元组自适应负采样(TANS),该函数能够涵盖传统平滑方法的特性。在FB15k-237、WN18RR和YAGO3-10数据集及其更稀疏子集上,对TransE、DistMult、ComplEx、RotatE、HAKE和HousE模型的实验结果表明,我们的解释具有合理性,且所提出的TANS带来了性能提升。