Knowledge Graph Embedding (KGE) models are used to learn continuous representations of entities and relations. A key task in the literature is predicting missing links between entities. However, Knowledge Graphs are not just sets of links but also have semantics underlying their structure. Semantics is crucial in several downstream tasks, such as query answering or reasoning. We introduce the subgraph inference task, where a model has to generate likely and semantically valid subgraphs. We propose IntelliGraphs, a set of five new Knowledge Graph datasets. The IntelliGraphs datasets contain subgraphs with semantics expressed in logical rules for evaluating subgraph inference. We also present the dataset generator that produced the synthetic datasets. We designed four novel baseline models, which include three models based on traditional KGEs. We evaluate their expressiveness and show that these models cannot capture the semantics. We believe this benchmark will encourage the development of machine learning models that emphasize semantic understanding.
翻译:知识图谱嵌入(KGE)模型用于学习实体和关系的连续表示。文献中的一项关键任务是预测实体之间的缺失链接。然而,知识图谱不仅仅是一组链接,其结构背后也蕴含语义。语义在若干下游任务(如查询回答或推理)中至关重要。我们引入了子图推理任务,该任务要求模型生成可能且语义有效的子图。我们提出了IntelliGraphs,这是一组由五个新知识图谱数据集构成的集合。IntelliGraphs数据集包含语义以逻辑规则形式表达的子图,用于评估子图推理。我们还介绍了生成这些合成数据集的生成器。我们设计了四种新颖的基线模型,其中三种基于传统KGE。我们评估了它们的表达能力,并表明这些模型无法捕获语义。我们相信这一基准测试将促进强调语义理解的机器学习模型的发展。