Bayesian inference provides a principled framework for probabilistic reasoning. If inference is performed in two steps, uncertainty propagation plays a crucial role in accounting for all sources of uncertainty and variability. This becomes particularly important when both aleatoric uncertainty, caused by data variability, and epistemic uncertainty, arising from incomplete knowledge or missing data, are present. Examples include surrogate models and missing data problems. In surrogate modeling, the surrogate is used as a simplified approximation of a resource-heavy and costly simulation. The uncertainty from the surrogate-fitting process can be propagated using a two-step procedure. For modeling with missing data, methods like Multivariate Imputation by Chained Equations (MICE) generate multiple datasets to account for imputation uncertainty. These approaches, however, are computationally expensive, as multiple models must be fitted separately to surrogate parameters respectively imputed datasets. To address these challenges, we propose an efficient two-step approach that reduces computational overhead while maintaining accuracy. By selecting a representative subset of draws or imputations, we construct a mixture distribution to approximate the desired posteriors using Pareto smoothed importance sampling. For more complex scenarios, this is further refined with importance weighted moment matching and an iterative procedure that broadens the mixture distribution to better capture diverse posterior distributions.
翻译:贝叶斯推断为概率推理提供了原则性框架。若推断过程分为两步执行,不确定性传播在考量所有不确定性与变异性来源方面起着关键作用。当同时存在由数据变异性引起的偶然不确定性,以及由知识不完整或数据缺失导致的认知不确定性时,这一点尤为重要。代理模型与缺失数据问题即为典型实例。在代理建模中,代理模型被用作资源密集型高成本仿真的简化近似。通过两步法可传播代理拟合过程产生的不确定性。对于含缺失数据的建模,链式方程多元插补(MICE)等方法通过生成多重数据集来考量插补不确定性。然而,这些方法需分别对代理参数或插补数据集拟合多个模型,计算成本高昂。为应对这些挑战,我们提出一种高效的两步法,在保持精度的同时降低计算开销。通过选取具有代表性的抽样子集或插补子集,我们构建混合分布,并利用帕累托平滑重要性采样来逼近目标后验分布。针对更复杂的场景,该方法进一步结合重要性加权矩匹配与迭代优化流程:通过扩展混合分布以更有效地捕捉多样化的后验分布形态。