A variety of knowledge graph embedding approaches have been developed. Most of them obtain embeddings by learning the structure of the knowledge graph within a link prediction setting. As a result, the embeddings reflect only the semantics of a single knowledge graph, and embeddings for different knowledge graphs are not aligned, e.g., they cannot be used to find similar entities across knowledge graphs via nearest neighbor search. However, knowledge graph embedding applications such as entity disambiguation require a more global representation, i.e., a representation that is valid across multiple sources. We propose to learn universal knowledge graph embeddings from large-scale interlinked knowledge sources. To this end, we fuse large knowledge graphs based on the owl:sameAs relation such that every entity is represented by a unique identity. We instantiate our idea by computing universal embeddings based on DBpedia and Wikidata yielding embeddings for about 180 million entities, 15 thousand relations, and 1.2 billion triples. Moreover, we develop a convenient API to provide embeddings as a service. Experiments on link prediction show that universal knowledge graph embeddings encode better semantics compared to embeddings computed on a single knowledge graph. For reproducibility purposes, we provide our source code and datasets open access at https://github.com/dice-group/Universal_Embeddings
翻译:多种知识图谱嵌入方法已被开发。大多数方法通过在链接预测场景中学习知识图谱的结构来获取嵌入。因此,嵌入仅反映单一知识图谱的语义,不同知识图谱的嵌入之间无法对齐,例如,它们无法用于通过最近邻搜索跨知识图谱找到相似实体。然而,实体消歧等知识图谱嵌入应用需要更全局的表示,即跨多个数据源有效的表示。我们提出从大规模互联的知识源中学习通用知识图谱嵌入。为此,我们基于owl:sameAs关系融合大型知识图谱,使每个实体拥有唯一标识。通过基于DBpedia和Wikidata计算通用嵌入,我们实例化了这一思路,得到了约1.8亿个实体、1.5万种关系和12亿条三元组的嵌入。此外,我们开发了便捷的API以提供嵌入即服务。链接预测实验表明,与在单一知识图谱上计算的嵌入相比,通用知识图谱嵌入能编码更好的语义。为便于复现,我们在https://github.com/dice-group/Universal_Embeddings 开源了代码和数据集。