Subsampling is effective in Knowledge Graph Embedding (KGE) for reducing overfitting caused by the sparsity in Knowledge Graph (KG) datasets. However, current subsampling approaches consider only frequencies of queries that consist of entities and their relations. Thus, the existing subsampling potentially underestimates the appearance probabilities of infrequent queries even if the frequencies of their entities or relations are high. To address this problem, we propose Model-based Subsampling (MBS) and Mixed Subsampling (MIX) to estimate their appearance probabilities through predictions of KGE models. Evaluation results on datasets FB15k-237, WN18RR, and YAGO3-10 showed that our proposed subsampling methods actually improved the KG completion performances for popular KGE models, RotatE, TransE, HAKE, ComplEx, and DistMult.
翻译:子采样在知识图谱嵌入(KGE)中能有效缓解因知识图谱(KG)数据集稀疏性导致的过拟合问题。然而,当前子采样方法仅考虑由实体及其关系构成的查询频率。因此,即使某个查询的实体或关系出现频率很高,现有子采样方法也可能低估其低频繁查询的显现概率。针对此问题,我们提出基于模型的子采样(MBS)与混合子采样(MIX)方法,通过KGE模型的预测来估计查询的显现概率。在FB15k-237、WN18RR和YAGO3-10数据集上的评估结果表明,本文提出的子采样方法显著提升了RotatE、TransE、HAKE、ComplEx及DistMult等主流KGE模型的知识图谱补全性能。