RAPTOR-GEN: RApid PosTeriOR GENerator for Bayesian Learning in Biomanufacturing

Biopharmaceutical manufacturing is vital to public health but lacks the agility for rapid, on-demand production of biotherapeutics due to the complexity and variability of bioprocesses. To overcome this, we introduce RApid PosTeriOR GENerator (RAPTOR-GEN), a mechanism-informed Bayesian learning framework designed to accelerate intelligent digital twin development from sparse and heterogeneous experimental data. This framework is built on a multi-scale probabilistic knowledge graph (pKG), formulated as a stochastic differential equation (SDE)-based foundational model that captures the nonlinear dynamics of bioprocesses. RAPTOR-GEN consists of two ingredients: (i) an interpretable metamodel integrating linear noise approximation (LNA) that exploits the structural information of bioprocessing mechanisms and a sequential learning strategy to fuse heterogeneous and sparse data, enabling inference of latent state variables and explicit approximation of the intractable likelihood function; and (ii) an efficient Bayesian posterior sampling method that utilizes Langevin diffusion (LD) to accelerate posterior exploration by exploiting the gradients of the derived likelihood. It generalizes the LNA approach to circumvent the challenge of step size selection, facilitating robust learning of mechanistic parameters with provable finite-sample performance guarantees. We develop a fast and robust RAPTOR-GEN algorithm with controllable error. Numerical experiments demonstrate its effectiveness in uncovering the underlying regulatory mechanisms of biomanufacturing processes.

翻译：生物制药制造对公共卫生至关重要，但由于生物过程的复杂性和变异性，其缺乏快速、按需生产生物治疗药物的敏捷性。为克服这一挑战，我们引入了快速后验生成器（RAPTOR-GEN），这是一个基于机理信息的贝叶斯学习框架，旨在从稀疏且异构的实验数据中加速智能数字孪生的开发。该框架建立在一个多尺度概率知识图谱（pKG）之上，该图谱被构建为一个基于随机微分方程（SDE）的基础模型，用以捕捉生物过程的非线性动力学。RAPTOR-GEN包含两个核心部分：（i）一个可解释的元模型，它整合了线性噪声近似（LNA），利用生物加工机理的结构信息以及一种序列学习策略来融合异构和稀疏数据，从而能够推断潜在状态变量并显式近似难以处理的似然函数；（ii）一种高效的贝叶斯后验采样方法，该方法利用朗之万扩散（LD），通过利用推导出的似然函数的梯度来加速后验探索。它推广了LNA方法，以规避步长选择的挑战，促进了对机理参数的稳健学习，并提供了可证明的有限样本性能保证。我们开发了一种快速且稳健的、具有可控误差的RAPTOR-GEN算法。数值实验证明了其在揭示生物制造过程潜在调控机制方面的有效性。