In the last decades, people have been consuming and combining more drugs than before, increasing the number of Drug-Drug Interactions (DDIs). To predict unknown DDIs, recently, studies started incorporating Knowledge Graphs (KGs) since they are able to capture the relationships among entities providing better drug representations than using a single drug property. In this paper, we propose the medicX end-to-end framework that integrates several drug features from public drug repositories into a KG and embeds the nodes in the graph using various translation, factorisation and Neural Network (NN) based KG Embedding (KGE) methods. Ultimately, we use a Machine Learning (ML) algorithm that predicts unknown DDIs. Among the different translation and factorisation-based KGE models, we found that the best performing combination was the ComplEx embedding method with a Long Short-Term Memory (LSTM) network, which obtained an F1-score of 95.19% on a dataset based on the DDIs found in DrugBank version 5.1.8. This score is 5.61% better than the state-of-the-art model DeepDDI. Additionally, we also developed a graph auto-encoder model that uses a Graph Neural Network (GNN), which achieved an F1-score of 91.94%. Consequently, GNNs have demonstrated a stronger ability to mine the underlying semantics of the KG than the ComplEx model, and thus using higher dimension embeddings within the GNN can lead to state-of-the-art performance.
翻译:近几十年来,人们服用的药物种类及联合用药数量较以往显著增加,导致药物-药物相互作用(DDIs)案例不断增多。为预测未知的DDIs,近年研究开始引入知识图谱(KGs),因其能够捕捉实体间关系,从而比单一药物属性提供更优的药物表征。本文提出medicX端到端框架,将来自公共药物数据库的多种药物特征整合至知识图谱中,并采用基于平移、矩阵分解及神经网络的多种知识图谱嵌入(KGE)方法对图中节点进行嵌入。最终,我们利用机器学习(ML)算法预测未知DDIs。在基于平移与矩阵分解的不同KGE模型中,我们发现表现最佳的组合是ComplEx嵌入方法与长短期记忆(LSTM)网络,其在基于DrugBank 5.1.8版DDIs数据集上的F1分数达到95.19%,较当前最优模型DeepDDI提升5.61%。此外,我们开发了基于图神经网络(GNN)的图自编码器模型,取得了91.94%的F1分数。实验表明,GNN在挖掘知识图谱深层语义方面能力优于ComplEx模型,因此在GNN中使用更高维嵌入有望获得最优性能。