A knowledge graph is a powerful representation of real-world entities and their relations. The vast majority of these relations are defined as positive statements, but the importance of negative statements is increasingly recognized, especially under an Open World Assumption. Explicitly considering negative statements has been shown to improve performance on tasks such as entity summarization and question answering or domain-specific tasks such as protein function prediction. However, no attention has been given to the exploration of negative statements by knowledge graph embedding approaches despite the potential of negative statements to produce more accurate representations of entities in a knowledge graph. We propose a novel approach, TrueWalks, to incorporate negative statements into the knowledge graph representation learning process. In particular, we present a novel walk-generation method that is able to not only differentiate between positive and negative statements but also take into account the semantic implications of negation in ontology-rich knowledge graphs. This is of particular importance for applications in the biomedical domain, where the inadequacy of embedding approaches regarding negative statements at the ontology level has been identified as a crucial limitation. We evaluate TrueWalks in ontology-rich biomedical knowledge graphs in two different predictive tasks based on KG embeddings: protein-protein interaction prediction and gene-disease association prediction. We conduct an extensive analysis over established benchmarks and demonstrate that our method is able to improve the performance of knowledge graph embeddings on all tasks.
翻译:知识图谱是对现实世界实体及其关系的强大表示。绝大多数这些关系被定义为正声明,但负声明的重要性日益得到认可,尤其是在开放世界假设下。明确考虑负声明已被证明能提高实体摘要、问答等任务或蛋白质功能预测等特定领域任务的性能。然而,尽管负声明有潜力产生知识图谱中更准确的实体表示,但知识图谱嵌入方法尚未关注对负声明的探索。我们提出了一种新颖的方法TrueWalks,将负声明融入知识图谱表示学习过程中。具体而言,我们提出了一种新颖的路径生成方法,不仅能区分正声明和负声明,还能考虑本体丰富的知识图谱中否定语义的蕴含。这对于生物医学领域的应用尤为重要,因为在该领域中,嵌入方法在本体层面处理负声明方面的不足已被认为是关键局限。我们在基于知识图谱嵌入的两种预测任务——蛋白质相互作用预测和基因-疾病关联预测——中对TrueWalks在富含本体的生物医学知识图谱上进行了评估。我们对既定基准进行了广泛分析,并证明我们的方法能在所有任务上提升知识图谱嵌入的性能。