Knowledge Graph Embedding (KGE) models are used to learn continuous representations of entities and relations. A key task in the literature is predicting missing links between entities. However, Knowledge Graphs are not just sets of links but also have semantics underlying their structure. Semantics is crucial in several downstream tasks, such as query answering or reasoning. We introduce the subgraph inference task, where a model has to generate likely and semantically valid subgraphs. We propose IntelliGraphs, a set of five new Knowledge Graph datasets. The IntelliGraphs datasets contain subgraphs with semantics expressed in logical rules for evaluating subgraph inference. We also present the dataset generator that produced the synthetic datasets. We designed four novel baseline models, which include three models based on traditional KGEs. We evaluate their expressiveness and show that these models cannot capture the semantics. We believe this benchmark will encourage the development of machine learning models that emphasize semantic understanding.
翻译:知识图谱嵌入模型用于学习实体和关系的连续表示。文献中的一个关键任务是预测实体间的缺失链接。然而,知识图谱不仅是链接的集合,其结构背后还具有语义信息。语义在多个下游任务(如查询回答或推理)中至关重要。我们引入了子图推理任务,要求模型生成可能且语义有效的子图。我们提出了IntelliGraphs,一组包含五个新知识图谱数据集。IntelliGraphs数据集包含以逻辑规则形式表达语义的子图,用于评估子图推理。我们还介绍了生成这些合成数据的数据集生成器。我们设计了四个新颖的基线模型,其中包括三个基于传统知识图谱嵌入的模型。我们评估了它们的表达能力,并表明这些模型无法捕捉语义。我们相信这一基准将促进强调语义理解的机器学习模型的发展。