Accurate identification of disease genes has consistently been one of the keys to decoding a disease's molecular mechanism. Most current approaches focus on constructing biological networks and utilizing machine learning, especially, deep learning to identify disease genes, but ignore the complex relations between entities in the biological knowledge graph. In this paper, we construct a biological knowledge graph centered on diseases and genes, and develop an end-to-end Knowledge graph completion model for Disease Gene Prediction using interactional tensor decomposition (called KDGene). KDGene introduces an interaction module between the embeddings of entities and relations to tensor decomposition, which can effectively enhance the information interaction in biological knowledge. Experimental results show that KDGene significantly outperforms state-of-the-art algorithms. Furthermore, the comprehensive biological analysis of the case of diabetes mellitus confirms KDGene's ability for identifying new and accurate candidate genes. This work proposes a scalable knowledge graph completion framework to identify disease candidate genes, from which the results are promising to provide valuable references for further wet experiments.
翻译:疾病基因的准确识别一直是解码疾病分子机制的关键之一。当前多数方法侧重于构建生物网络并利用机器学习(尤其是深度学习)来识别疾病基因,但忽略了生物知识图谱中实体间的复杂关系。本文构建了一个以疾病和基因为中心的生物知识图谱,并开发了一种利用交互式张量分解的端到端知识图谱补全模型(命名为KDGene),用于疾病基因预测。KDGene在张量分解中引入了实体与关系嵌入之间的交互模块,能够有效增强生物知识中的信息交互。实验结果表明,KDGene显著优于当前最优算法。此外,针对糖尿病的综合生物学分析证实了KDGene识别新颖且精准候选基因的能力。本研究提出了一种可扩展的知识图谱补全框架,用于识别疾病候选基因,其结果有望为进一步的湿实验提供有价值的参考。