Genes are fundamental for analyzing biological systems and many recent works proposed to utilize gene expression for various biological tasks by deep learning models. Despite their promising performance, it is hard for deep neural networks to provide biological insights for humans due to their black-box nature. Recently, some works integrated biological knowledge with neural networks to improve the transparency and performance of their models. However, these methods can only incorporate partial biological knowledge, leading to suboptimal performance. In this paper, we propose the Biological Factor Regulatory Neural Network (BFReg-NN), a generic framework to model relations among biological factors in cell systems. BFReg-NN starts from gene expression data and is capable of merging most existing biological knowledge into the model, including the regulatory relations among genes or proteins (e.g., gene regulatory networks (GRN), protein-protein interaction networks (PPI)) and the hierarchical relations among genes, proteins and pathways (e.g., several genes/proteins are contained in a pathway). Moreover, BFReg-NN also has the ability to provide new biologically meaningful insights because of its white-box characteristics. Experimental results on different gene expression-based tasks verify the superiority of BFReg-NN compared with baselines. Our case studies also show that the key insights found by BFReg-NN are consistent with the biological literature.
翻译:基因是分析生物系统的基础,近期许多研究利用深度学习模型通过基因表达数据完成各类生物任务。尽管这些模型表现优异,但由于其黑箱特性,深度神经网络难以提供人类可理解的生物学见解。近期,部分研究尝试将生物学知识与神经网络相结合,以提高模型的透明度和性能。然而,现有方法仅能整合部分生物学知识,导致性能次优。本文提出生物因子调控神经网络(BFReg-NN),这是一个用于建模细胞系统中生物因子关系的通用框架。BFReg-NN以基因表达数据为起点,能够将大多数现有生物学知识融入模型,包括基因或蛋白质间的调控关系(如基因调控网络(GRN)、蛋白质相互作用网络(PPI))以及基因、蛋白质和通路间的层次关系(如多个基因/蛋白质包含于同一通路)。此外,由于BFReg-NN具有白箱特性,它还能提供具有生物学意义的新见解。基于不同基因表达任务的实验结果表明,BFReg-NN在基线方法中表现优异。我们的案例研究也显示,BFReg-NN发现的关键见解与生物学文献结论一致。