Model-based approaches for (bio)process systems often suffer from incomplete knowledge of the underlying physical, chemical, or biological laws. Universal differential equations, which embed neural networks within differential equations, have emerged as powerful tools to learn this missing physics from experimental data. However, neural networks are inherently opaque, motivating their post-processing via symbolic regression to obtain interpretable mathematical expressions. Genetic algorithm-based symbolic regression is a popular approach for this post-processing step, but provides only point estimates and cannot quantify the confidence we should place in a discovered equation. We address this limitation by applying Bayesian symbolic regression, which uses Reversible Jump Markov Chain Monte Carlo to sample from the posterior distribution over symbolic expression trees. This approach naturally quantifies uncertainty in the recovered model structure. We demonstrate the methodology on a Lotka-Volterra predator-prey system and then show how a well-designed experiment leads to lower uncertainty in a fed-batch bioreactor case study.
翻译:基于模型的(生物)过程系统方法常因对底层物理、化学或生物规律认知不完整而受限。通用微分方程——将神经网络嵌入微分方程框架——已成为从实验数据中学习此类缺失物理规律的有力工具。然而,神经网络本质上是黑箱模型,这促使我们通过符号回归进行后处理以获取可解释的数学表达式。基于遗传算法的符号回归是该后处理步骤的常用方法,但其仅提供点估计而无法量化对发现方程的可信度。我们通过应用贝叶斯符号回归来解决这一局限,该方法使用可逆跳转马尔可夫链蒙特卡洛方法对符号表达式树的后验分布进行采样。此方法能自然量化所恢复模型结构的不确定性。我们在Lotka-Volterra捕食者-猎物系统中验证了该方法,随后通过补料分批生物反应器的案例研究表明,精心设计的实验如何降低模型的不确定性。