Scientific computer simulations cannot represent all scales in realistic applications. To bridge this model-data gap, parameters are injected into models and constrained with noisy data using Bayesian inversion. To reduce the number of simulator evaluations, which can be 10^5 or more, modern approaches employ dimension reduction in conjunction with emulation of the forward map (that contains the simulator). Due to scarcity of model evaluations and data, this dimension reduction becomes very important for posterior sampling performance. Recent work on likelihood-informed subspaces (LIS) truncates to informative directions by optimizing bounds on information loss, and though mathematically well-adapted to sampling, they are often restrictive in practice. In this work, we provably generalize this methodology to facilitate application to $α$-tempered (i.e., annealed, power-posterior) distributions for $α$ in [0,1]. We provide theory to build partially-informed spaces termed $α$-LIS. We show how $α$ < 1 can often produce near-optimal spaces. In addition, we focus on applying $α$-LIS to practical cases, where the available data is severely limited and noisy. We propose and test extensions for utilizing data from the entire sequence of distributions $α$_0 < ... < $α$_k, and use simple approximations of model gradients so that our approach can be used for emulation of forward maps for chaotic or stochastic systems where derivatives are unavailable or uninformative due to noise. In experiments, our accumulated approach is much more robust to these challenging circumstances than the theoretically optimal $α$ = 1.
翻译:科学计算机模拟无法在现实应用中涵盖所有尺度。为弥补这一模型-数据缺口,通过贝叶斯反演将参数注入模型并用含噪声数据加以约束。为减少可能达到10^5次甚至更多的模拟器评估次数,现代方法结合正向映射(包含模拟器)的代理建模进行降维。由于模型评估次数和数据的稀缺性,这种降维对后验采样性能至关重要。近期关于似然信息子空间(LIS)的研究通过优化信息损失边界截断至信息性方向,虽然数学上适用于采样,但在实践中往往限制较多。本文中,我们可证明地将这一方法推广至适用于[0,1]区间内$α$的$α$温度调节(即退火、幂后验)分布。我们提出构建称为$α$-LIS的部分信息空间的理论,并展示$α < 1$如何能常产生接近最优的空间。此外,我们重点将$α$-LIS应用于可用数据极其有限且含噪的实际案例中。我们提出并测试了利用整个分布序列$α_0 < ... < α_k$数据的扩展方法,并采用模型梯度的简单近似,使得该方法可用于混沌或随机系统中因噪声导致导数不可用或无信息性的正向映射代理建模。实验表明,与理论上最优的$α=1$相比,我们的累积方法在这些挑战性环境下具有更强的鲁棒性。