Knowledge graphs (KGs) consist of links that describe relationships between entities. Due to the difficulty of manually enumerating all relationships between entities, automatically completing them is essential for KGs. Knowledge Graph Completion (KGC) is a task that infers unseen relationships between entities in a KG. Traditional embedding-based KGC methods, such as RESCAL, TransE, DistMult, ComplEx, RotatE, HAKE, HousE, etc., infer missing links using only the knowledge from training data. In contrast, the recent Pre-trained Language Model (PLM)-based KGC utilizes knowledge obtained during pre-training. Therefore, PLM-based KGC can estimate missing links between entities by reusing memorized knowledge from pre-training without inference. This approach is problematic because building KGC models aims to infer unseen links between entities. However, conventional evaluations in KGC do not consider inference and memorization abilities separately. Thus, a PLM-based KGC method, which achieves high performance in current KGC evaluations, may be ineffective in practical applications. To address this issue, we analyze whether PLM-based KGC methods make inferences or merely access memorized knowledge. For this purpose, we propose a method for constructing synthetic datasets specified in this analysis and conclude that PLMs acquire the inference abilities required for KGC through pre-training, even though the performance improvements mostly come from textual information of entities and relations.
翻译:知识图谱(KGs)由描述实体间关系的链接构成。由于人工枚举实体间所有关系存在困难,自动补全这些关系对于知识图谱至关重要。知识图谱补全(KGC)是一项推断知识图谱中实体间未见关系的任务。传统的基于嵌入的KGC方法,如RESCAL、TransE、DistMult、ComplEx、RotatE、HAKE、HousE等,仅利用训练数据中的知识来推断缺失链接。相比之下,近期基于预训练语言模型(PLM)的KGC方法利用了预训练阶段获得的知识。因此,基于PLM的KGC可以通过复用预训练中记忆的知识来估计实体间的缺失链接,而无需进行推理。这种方法存在问题,因为构建KGC模型的目的在于推断实体间未见链接。然而,传统的KGC评估并未将推理能力与记忆能力分开考量。因此,在当前KGC评估中表现优异的基于PLM的KGC方法,在实际应用中可能效果有限。为解决此问题,我们分析了基于PLM的KGC方法是在进行推理还是仅访问记忆知识。为此,我们提出了一种构建专用于此分析的合成数据集的方法,并得出结论:尽管性能提升主要来源于实体和关系的文本信息,但PLM通过预训练获得了KGC所需的推理能力。