Knowledge graph embedding models (KGEMs) have gained considerable traction in recent years. These models learn a vector representation of knowledge graph entities and relations, a.k.a. knowledge graph embeddings (KGEs). Learning versatile KGEs is desirable as it makes them useful for a broad range of tasks. However, KGEMs are usually trained for a specific task, which makes their embeddings task-dependent. In parallel, the widespread assumption that KGEMs actually create a semantic representation of the underlying entities and relations (e.g., project similar entities closer than dissimilar ones) has been challenged. In this work, we design heuristics for generating protographs -- small, modified versions of a KG that leverage schema-based information. The learnt protograph-based embeddings are meant to encapsulate the semantics of a KG, and can be leveraged in learning KGEs that, in turn, also better capture semantics. Extensive experiments on various evaluation benchmarks demonstrate the soundness of this approach, which we call Modular and Agnostic SCHema-based Integration of protograph Embeddings (MASCHInE). In particular, MASCHInE helps produce more versatile KGEs that yield substantially better performance for entity clustering and node classification tasks. For link prediction, using MASCHInE has little impact on rank-based performance but increases the number of semantically valid predictions.
翻译:知识图谱嵌入模型(KGEMs)近年来获得了广泛关注。这些模型学习知识图谱中实体与关系的向量表示,即知识图谱嵌入(KGEs)。学习通用型KGEs具有重要价值,因为这类嵌入能够适用于广泛的任务场景。然而,KGEMs通常针对特定任务进行训练,导致其嵌入具有任务依赖性。与此同时,关于KGEMs实际能否构建底层实体与关系的语义表征(例如,将相似实体投影到更接近的位置,而非相似实体则投影到更远的位置)这一普遍假设已受到质疑。本研究设计了生成原型图——基于模式信息对知识图谱进行小型修改版本——的启发式方法。所学习的基于原型图的嵌入旨在封装知识图谱的语义,并可进一步用于学习更好捕捉语义的KGEs。在多种评估基准上的大量实验证实了本文所提出的模块化与不可知论的原型图嵌入模式集成(MASCHInE)方法的有效性。具体而言,MASCHInE能够助力生成更通用的KGEs,在实体聚类和节点分类任务中显著提升性能。在链接预测任务中,使用MASCHInE对基于排名的性能影响甚微,但能够增加语义有效预测的数量。