Recent work on Graph Neural Networks has demonstrated that self-supervised pretraining can further enhance performance on downstream graph, link, and node classification tasks. However, the efficacy of pretraining tasks has not been fully investigated for downstream large knowledge graph completion tasks. Using a contextualized knowledge graph embedding approach, we investigate five different pretraining signals, constructed using several graph algorithms and no external data, as well as their combination. We leverage the versatility of our Transformer-based model to explore graph structure generation pretraining tasks (i.e. path and k-hop neighborhood generation), typically inapplicable to most graph embedding methods. We further propose a new path-finding algorithm guided by information gain and find that it is the best-performing pretraining task across three downstream knowledge graph completion datasets. While using our new path-finding algorithm as a pretraining signal provides 2-3% MRR improvements, we show that pretraining on all signals together gives the best knowledge graph completion results. In a multitask setting that combines all pretraining tasks, our method surpasses the latest and strong performing knowledge graph embedding methods on all metrics for FB15K-237, on MRR and Hit@1 for WN18RRand on MRR and hit@10 for JF17K (a knowledge hypergraph dataset).
翻译:近年来,图神经网络研究表明自监督预训练可进一步提升下游图、链路及节点分类任务的性能。然而,预训练任务对大型知识图谱补全任务的有效性尚未得到充分探究。我们采用语境化知识图谱嵌入方法,系统研究了五种基于图算法构建且无需外部数据的预训练信号及其组合效果。借助基于Transformer的模型的多功能性,探索了通常适用于多数图嵌入方法的图结构生成预训练任务(即路径生成与k跳邻域生成)。我们进一步提出受信息增益引导的新型路径查找算法,实验证明该算法在三个下游知识图谱补全数据集中表现最优。虽然使用该新型路径查找算法作为预训练信号仅能带来2-3%的MRR提升,但联合使用所有预训练信号可获得最佳知识图谱补全效果。在多任务联合预训练框架下,我们的方法在FB15K-237数据集的所有指标、WN18RR数据集的MRR与Hit@1指标以及JF17K(知识超图数据集)的MRR与Hit@10指标上均超越了现有最优知识图谱嵌入方法。