This work presents an omics-driven modeling pipeline that integrates machine-learning tools to facilitate the dynamic modeling of multiscale biological systems. Random forests and permutation feature importance are proposed to mine omics datasets, guiding feature selection and dimensionality reduction for dynamic modeling. Continuous and differentiable machine-learning functions can be trained to link the reduced omics feature set to key components of the dynamic model, resulting in a hybrid model. As proof of concept, we apply this framework to a high-dimensional proteomics dataset of $\textit{Saccharomyces cerevisiae}$. After identifying key intracellular proteins that correlate with cell growth, targeted dynamic experiments are designed, and key model parameters are captured as functions of the selected proteins using Gaussian processes. This approach captures the dynamic behavior of yeast strains under varying proteome profiles while estimating the uncertainty in the hybrid model's predictions. The outlined modeling framework is adaptable to other scenarios, such as integrating additional layers of omics data for more advanced multiscale biological systems, or employing alternative machine-learning methods to handle larger datasets. Overall, this study outlines a strategy for leveraging omics data to inform multiscale dynamic modeling in systems biology and bioprocess engineering.
翻译:本研究提出了一种组学驱动的建模流程,该流程整合了机器学习工具以促进多尺度生物系统的动态建模。我们提出使用随机森林和置换特征重要性挖掘组学数据集,从而指导动态建模中的特征选择与降维。通过训练连续可微的机器学习函数,可将降维后的组学特征集与动态模型的关键组件相连接,形成混合模型。作为概念验证,我们将此框架应用于酿酒酵母的高维蛋白质组学数据集。在识别出与细胞生长相关的关键胞内蛋白后,我们设计了靶向动态实验,并利用高斯过程将关键模型参数捕获为选定蛋白质的函数。该方法能够捕捉不同蛋白质组谱下酵母菌株的动态行为,同时估计混合模型预测的不确定性。所概述的建模框架可适用于其他场景,例如整合更多层次的组学数据以构建更先进的多尺度生物系统,或采用其他机器学习方法处理更大规模的数据集。总体而言,本研究提出了一种利用组学数据指导系统生物学与生物过程工程中多尺度动态建模的策略。