In the last decades, people have been consuming and combining more drugs than before, increasing the number of Drug-Drug Interactions (DDIs). To predict unknown DDIs, recently, studies started incorporating Knowledge Graphs (KGs) since they are able to capture the relationships among entities providing better drug representations than using a single drug property. In this paper, we propose the medicX end-to-end framework that integrates several drug features from public drug repositories into a KG and embeds the nodes in the graph using various translation, factorisation and Neural Network (NN) based KG Embedding (KGE) methods. Ultimately, we use a Machine Learning (ML) algorithm that predicts unknown DDIs. Among the different translation and factorisation-based KGE models, we found that the best performing combination was the ComplEx embedding method with a Long Short-Term Memory (LSTM) network, which obtained an F1-score of 95.19% on a dataset based on the DDIs found in DrugBank version 5.1.8. This score is 5.61% better than the state-of-the-art model DeepDDI. Additionally, we also developed a graph auto-encoder model that uses a Graph Neural Network (GNN), which achieved an F1-score of 91.94%. Consequently, GNNs have demonstrated a stronger ability to mine the underlying semantics of the KG than the ComplEx model, and thus using higher dimension embeddings within the GNN can lead to state-of-the-art performance.
翻译:近几十年来,人们服用的药物种类和联合用药数量较以往显著增加,导致药物相互作用(DDI)事件频发。为预测未知的DDI,近期研究开始引入知识图谱(KG),因其能够捕捉实体间关系,从而提供优于单一药物属性的药物表征。本文提出medicX端到端框架,整合来自公开药物数据库的多种药物特征构成KG,并采用基于平移、分解及神经网络(NN)的知识图谱嵌入(KGE)方法对图中节点进行嵌入。最终,我们利用机器学习(ML)算法预测未知DDI。在各类基于平移和分解的KGE模型中,最佳组合为ComplEx嵌入方法结合长短期记忆(LSTM)网络,其在基于DrugBank 5.1.8版本DDI数据集上取得了95.19%的F1分数,较当前最先进的DeepDDI模型提升了5.61%。此外,我们还开发了基于图神经网络(GNN)的图自编码器模型,其F1分数达91.94%。实验表明,GNN在挖掘KG深层语义方面优于ComplEx模型,因此采用更高维嵌入的GNN有望实现当前最优性能。