Knowledge Graph Embedding (KGE) models are used to learn continuous representations of entities and relations. A key task in the literature is predicting missing links between entities. However, Knowledge Graphs are not just sets of links but also have semantics underlying their structure. Semantics is crucial in several downstream tasks, such as query answering or reasoning. We introduce the subgraph inference task, where a model has to generate likely and semantically valid subgraphs. We propose IntelliGraphs, a set of five new Knowledge Graph datasets. The IntelliGraphs datasets contain subgraphs with semantics expressed in logical rules for evaluating subgraph inference. We also present the dataset generator that produced the synthetic datasets. We designed four novel baseline models, which include three models based on traditional KGEs. We evaluate their expressiveness and show that these models cannot capture the semantics. We believe this benchmark will encourage the development of machine learning models that emphasize semantic understanding.
翻译:知识图谱嵌入(KGE)模型用于学习实体和关系的连续表示。该领域的一项关键任务是预测实体间的缺失链接。然而,知识图谱不仅是一组链接,其结构背后还蕴含着语义信息。语义在查询回答或推理等多个下游任务中至关重要。我们引入了子图推理任务,要求模型生成合理且语义上有效的子图。我们提出了IntelliGraphs,一个包含五个新知识图谱数据集的集合。IntelliGraphs数据集包含以逻辑规则表达的语义子图,用于评估子图推理性能。我们还介绍了生成这些合成数据集的生成器。我们设计了四个新颖的基线模型,其中三个基于传统KGE模型。我们评估了这些模型的表达能力,并证明它们无法捕获语义。我们相信这一基准测试将推动强调语义理解的机器学习模型的发展。