Retrieving gene functional networks from knowledge databases presents a challenge due to the mismatch between disease networks and subtype-specific variations. Current solutions, including statistical and deep learning methods, often fail to effectively integrate gene interaction knowledge from databases or explicitly learn subtype-specific interactions. To address this mismatch, we propose GeSubNet, which learns a unified representation capable of predicting gene interactions while distinguishing between different disease subtypes. Graphs generated by such representations can be considered subtype-specific networks. GeSubNet is a multi-step representation learning framework with three modules: First, a deep generative model learns distinct disease subtypes from patient gene expression profiles. Second, a graph neural network captures representations of prior gene networks from knowledge databases, ensuring accurate physical gene interactions. Finally, we integrate these two representations using an inference loss that leverages graph generation capabilities, conditioned on the patient separation loss, to refine subtype-specific information in the learned representation. GeSubNet consistently outperforms traditional methods, with average improvements of 30.6%, 21.0%, 20.1%, and 56.6% across four graph evaluation metrics, averaged over four cancer datasets. Particularly, we conduct a biological simulation experiment to assess how the behavior of selected genes from over 11,000 candidates affects subtypes or patient distributions. The results show that the generated network has the potential to identify subtype-specific genes with an 83% likelihood of impacting patient distribution shifts. The GeSubNet resource is available: https://anonymous.4open.science/r/GeSubNet/
翻译:从知识数据库中检索基因功能网络面临挑战,主要源于疾病网络与亚型特异性变异之间的不匹配。现有解决方案(包括统计方法与深度学习方法)往往难以有效整合数据库中的基因互作知识,或无法显式学习亚型特异性互作关系。为解决这一不匹配问题,我们提出GeSubNet模型,该模型通过学习能够预测基因互作并区分不同疾病亚型的统一表征。由此类表征生成的图可视为亚型特异性网络。GeSubNet是一个多步骤表征学习框架,包含三个模块:首先,深度生成模型从患者基因表达谱中学习不同的疾病亚型;其次,图神经网络从知识数据库中提取先验基因网络的表征,确保获得准确的物理基因互作关系;最后,我们通过融合图生成能力的推断损失整合上述两种表征,该损失以患者分离损失为条件,用于优化所学表征中的亚型特异性信息。在四个癌症数据集上,GeSubNet在四项图评估指标中平均分别提升30.6%、21.0%、20.1%和56.6%,持续优于传统方法。特别地,我们通过生物学模拟实验评估了从超过11,000个候选基因中筛选出的基因行为如何影响亚型或患者分布。结果表明,所生成的网络具有识别亚型特异性基因的潜力,这些基因有83%的可能性会影响患者分布的变化。GeSubNet资源已公开:https://anonymous.4open.science/r/GeSubNet/