Parameter calibration is essential for reducing uncertainty and improving predictive fidelity in physics-based models, yet it is often limited by the high computational cost of model evaluations. Bayesian calibration methods provide a principled framework for combining prior information with data while rigorously quantifying uncertainty. In this work, we compare four emulator-based Bayesian calibration strategies, Calibrate-Emulate-Sample (CES), History Matching (HM), Bayesian Optimal Experimental Design (BOED), and a goal-oriented extension of BOED (GBOED). The proposed GBOED formulation explicitly targets information gain with respect to the calibration posterior, aligning design decisions with downstream inference. We assess methods using accuracy and uncertainty quantification metrics, convergence behavior under increasing computational budgets, and practical considerations such as implementation complexity and robustness. For the Lorenz '96 system, CES, HM, and GBOED all yield strong calibration performance, even with limited numbers of model evaluations, while standard BOED generally underperforms in this setting. Differences among the strongest methods are modest, particularly as computational budgets increase. For the two-layer quasi-geostrophic system, all methods produce reasonable posterior estimates, and convergence behavior is more consistent. Overall, our results indicate that multiple emulator-based calibration strategies can perform comparably well when applied appropriately, with method selection often guided more by computational and practical considerations than by accuracy alone. These findings highlight both the limitations of standard BOED for calibration and the promise of goal-oriented and iterative approaches for efficient Bayesian inference in complex dynamical systems.
翻译:参数校准对于降低物理模型的不确定性和提高预测保真度至关重要,但常受限于模型评估的高计算成本。贝叶斯校准方法提供了一个原则性框架,能够将先验信息与数据相结合,同时严格量化不确定性。本研究比较了四种基于代理模型的贝叶斯校准策略:校准-仿真-采样法、历史匹配法、贝叶斯最优实验设计法及其面向目标的扩展方法。所提出的面向目标贝叶斯最优实验设计方法明确针对校准后验分布的信息增益,使设计决策与下游推断保持一致。我们通过准确性与不确定性量化指标、计算预算增加时的收敛行为以及实施复杂性和鲁棒性等实际考量来评估这些方法。对于Lorenz '96系统,校准-仿真-采样法、历史匹配法和面向目标贝叶斯最优实验设计法即使在有限模型评估次数下均表现出优异的校准性能,而标准贝叶斯最优实验设计法在此场景中普遍表现欠佳。最优方法间的差异较小,尤其在计算预算增加时更为明显。对于两层准地转系统,所有方法均能产生合理的后验估计,且收敛行为更为一致。总体而言,我们的研究结果表明,当应用恰当时,多种基于代理模型的校准策略均可获得相当的性能,方法选择往往更多取决于计算与实际考量而非单纯准确性。这些发现既揭示了标准贝叶斯最优实验设计法在校准中的局限性,也展现了面向目标方法和迭代方法在复杂动力系统中实现高效贝叶斯推断的潜力。