While a number of knowledge graph representation learning (KGRL) methods have been proposed over the past decade, very few theoretical analyses have been conducted on them. In this paper, we present the first PAC-Bayesian generalization bounds for KGRL methods. To analyze a broad class of KGRL models, we propose a generic framework named ReED (Relation-aware Encoder-Decoder), which consists of a relation-aware message passing encoder and a triplet classification decoder. Our ReED framework can express at least 15 different existing KGRL models, including not only graph neural network-based models such as R-GCN and CompGCN but also shallow-architecture models such as RotatE and ANALOGY. Our generalization bounds for the ReED framework provide theoretical grounds for the commonly used tricks in KGRL, e.g., parameter-sharing and weight normalization schemes, and guide desirable design choices for practical KGRL methods. We empirically show that the critical factors in our generalization bounds can explain actual generalization errors on three real-world knowledge graphs.
翻译:尽管过去十年间已提出大量知识图谱表示学习方法,但针对这些方法的理论分析仍十分匮乏。本文首次提出了知识图谱表示学习方法的PAC-Bayesian泛化界。为分析广泛类型的知识图谱表示学习模型,我们提出了名为ReED(关系感知编码器-解码器)的通用框架,该框架包含关系感知消息传递编码器与三元组分类解码器。我们的ReED框架至少可以表达15种不同的现有知识图谱表示学习模型,不仅包括基于图神经网络的R-GCN与CompGCN等模型,还涵盖RotatE与ANALOGY等浅层架构模型。针对ReED框架建立的泛化界为知识图谱表示学习中的常用技巧(如参数共享与权重归一化方案)提供了理论依据,并指导了实用型知识图谱表示学习方法的理想设计选择。实验表明,我们泛化界中的关键因子可有效解释三个真实知识图谱上的实际泛化误差。