While a number of knowledge graph representation learning (KGRL) methods have been proposed over the past decade, very few theoretical analyses have been conducted on them. In this paper, we present the first PAC-Bayesian generalization bounds for KGRL methods. To analyze a broad class of KGRL models, we propose a generic framework named ReED (Relation-aware Encoder-Decoder), which consists of a relation-aware message passing encoder and a triplet classification decoder. Our ReED framework can express at least 15 different existing KGRL models, including not only graph neural network-based models such as R-GCN and CompGCN but also shallow-architecture models such as RotatE and ANALOGY. Our generalization bounds for the ReED framework provide theoretical grounds for the commonly used tricks in KGRL, e.g., parameter-sharing and weight normalization schemes, and guide desirable design choices for practical KGRL methods. We empirically show that the critical factors in our generalization bounds can explain actual generalization errors on three real-world knowledge graphs.
翻译:尽管过去十年间已提出了多种知识图谱表示学习(KGRL)方法,但针对它们的理论分析却非常有限。本文首次提出了适用于KGRL方法的PAC-Bayesian泛化边界。为分析一大类KGRL模型,我们提出了一个名为ReED(关系感知编码器-解码器)的通用框架,该框架包含一个关系感知的消息传递编码器和一个三元组分类解码器。我们的ReED框架能够涵盖至少15种不同的现有KGRL模型,不仅包括基于图神经网络的模型(如R-GCN和CompGCN),也包含浅层架构模型(如RotatE和ANALOGY)。针对ReED框架的泛化边界为KGRL中常用的技巧(例如参数共享和权重归一化方案)提供了理论依据,并为实际KGRL方法的理想设计选择提供了指导。我们通过实验证明,泛化边界中的关键因素能够解释三个真实世界知识图谱上的实际泛化误差。