Knowledge graphs represent facts about real-world entities. Most of these facts are defined as positive statements. The negative statements are scarce but highly relevant under the open-world assumption. Furthermore, they have been demonstrated to improve the performance of several applications, namely in the biomedical domain. However, no benchmark dataset supports the evaluation of the methods that consider these negative statements. We present a collection of datasets for three relation prediction tasks - protein-protein interaction prediction, gene-disease association prediction and disease prediction - that aim at circumventing the difficulties in building benchmarks for knowledge graphs with negative statements. These datasets include data from two successful biomedical ontologies, Gene Ontology and Human Phenotype Ontology, enriched with negative statements. We also generate knowledge graph embeddings for each dataset with two popular path-based methods and evaluate the performance in each task. The results show that the negative statements can improve the performance of knowledge graph embeddings.
翻译:知识图谱表示现实世界实体的事实。大多数事实被定义为正面陈述。在开放世界假设下,否定陈述虽然稀少但具有高度相关性。此外,已有研究表明否定陈述能够提升多项应用的性能,特别是在生物医学领域。然而,目前尚无支持评估考虑这些否定陈述的方法的基准数据集。我们针对三个关系预测任务——蛋白质-蛋白质相互作用预测、基因-疾病关联预测和疾病预测——提出了一系列数据集,旨在规避构建含有否定陈述的知识图谱基准时所面临的困难。这些数据集包含来自两个成功生物医学本体(基因本体和人类表型本体)的数据,并补充了否定陈述。我们还通过两种流行的基于路径的方法为每个数据集生成了知识图谱嵌入,并评估了各任务中的性能。结果表明,否定陈述能够提升知识图谱嵌入的性能。