A variety of knowledge graph embedding approaches have been developed. Most of them obtain embeddings by learning the structure of the knowledge graph within a link prediction setting. As a result, the embeddings reflect only the structure of a single knowledge graph, and embeddings for different knowledge graphs are not aligned, e.g., they cannot be used to find similar entities across knowledge graphs via nearest neighbor search. However, knowledge graph embedding applications such as entity disambiguation require a more global representation, i.e., a representation that is valid across multiple sources. We propose to learn universal knowledge graph embeddings from large-scale interlinked knowledge sources. To this end, we fuse large knowledge graphs based on the owl:sameAs relation such that every entity is represented by a unique identity. We instantiate our idea by computing universal embeddings based on DBpedia and Wikidata yielding embeddings for about 180 million entities, 15 thousand relations, and 1.2 billion triples. We believe our computed embeddings will support the emerging field of graph foundation models. Moreover, we develop a convenient API to provide embeddings as a service. Experiments on link prediction suggest that universal knowledge graph embeddings encode better semantics compared to embeddings computed on a single knowledge graph. For reproducibility purposes, we provide our source code and datasets open access.
翻译:当前已发展出多种知识图谱嵌入方法。大多数方法通过在链接预测场景中学习知识图谱的结构来获取嵌入向量。因此,这些嵌入仅反映单一知识图谱的结构,且不同知识图谱的嵌入未对齐,例如无法通过最近邻搜索跨图谱发现相似实体。然而,实体消歧等知识图谱嵌入应用需要更具全局性的表征,即适用于多数据源的统一表征。我们提出从大规模互联知识源中学习通用知识图谱嵌入。为此,我们基于owl:sameAs关系融合大型知识图谱,使每个实体具有唯一标识。我们以DBpedia和Wikidata为基础计算通用嵌入,实现了约1.8亿实体、1.5万关系和12亿三元组的嵌入表征。我们相信所计算的嵌入将支持新兴的图基础模型领域。此外,我们开发了便捷的API以提供嵌入即服务。链接预测实验表明,与基于单一知识图谱计算的嵌入相比,通用知识图谱嵌入能编码更优的语义信息。为保障可复现性,我们已开源相关源代码与数据集。