The identification of cancer genes is a critical yet challenging problem in cancer genomics research. Existing computational methods, including deep graph neural networks, fail to exploit the multilayered gene-gene interactions or provide limited explanation for their predictions. These methods are restricted to a single biological network, which cannot capture the full complexity of tumorigenesis. Models trained on different biological networks often yield different and even opposite cancer gene predictions, hindering their trustworthy adaptation. Here, we introduce an Explainable Multilayer Graph Neural Network (EMGNN) approach to identify cancer genes by leveraging multiple genegene interaction networks and pan-cancer multi-omics data. Unlike conventional graph learning on a single biological network, EMGNN uses a multilayered graph neural network to learn from multiple biological networks for accurate cancer gene prediction. Our method consistently outperforms all existing methods, with an average 7.15% improvement in area under the precision-recall curve (AUPR) over the current state-of-the-art method. Importantly, EMGNN integrated multiple graphs to prioritize newly predicted cancer genes with conflicting predictions from single biological networks. For each prediction, EMGNN provided valuable biological insights via both model-level feature importance explanations and molecular-level gene set enrichment analysis. Overall, EMGNN offers a powerful new paradigm of graph learning through modeling the multilayered topological gene relationships and provides a valuable tool for cancer genomics research.
翻译:癌症基因的识别是癌症基因组学研究中的一个关键且具有挑战性的问题。现有的计算方法,包括深度图神经网络,未能利用多层基因-基因相互作用或对其预测提供有限的解释。这些方法局限于单一生物网络,无法捕捉肿瘤发生的全部复杂性。基于不同生物网络训练的模型往往产生不同甚至相反的癌症基因预测,阻碍了其可信的适应性。在此,我们提出了一种可解释多层图神经网络(EMGNN)方法,通过利用多个基因相互作用网络和泛癌多组学数据来识别癌症基因。与在单一生物网络上进行的传统图学习不同,EMGNN使用多层图神经网络从多个生物网络中学习,以实现准确的癌症基因预测。我们的方法在平均精确率-召回率曲线下面积(AUPR)上比现有最先进方法提高了7.15%,始终优于所有现有方法。重要的是,EMGNN集成了多个图来优先处理那些来自单一生物网络预测存在冲突的新预测癌症基因。对于每个预测,EMGNN通过模型级特征重要性解释和分子级基因集富集分析提供了有价值的生物学见解。总体而言,EMGNN通过建模多层拓扑基因关系提供了一种强大的图学习新范式,并为癌症基因组学研究提供了有价值的工具。