Bayesian posterior distributions arising in modern applications, including inverse problems in partial differential equation models in tomography and subsurface flow, are often computationally intractable due to the large computational cost of evaluating the data likelihood. To alleviate this problem, we consider using Gaussian process regression to build a surrogate model for the likelihood, resulting in an approximate posterior distribution that is amenable to computations in practice. This work serves as an introduction to Gaussian process regression, in particular in the context of building surrogate models for inverse problems, and presents new insights into a suitable choice of training points. We show that the error between the true and approximate posterior distribution can be bounded by the error between the true and approximate likelihood, measured in the $L^2$-norm weighted by the true posterior, and that efficiently bounding the error between the true and approximate likelihood in this norm suggests choosing the training points in the Gaussian process surrogate model based on the true posterior.
翻译:在现代应用中,包括层析成像和地下流动中的偏微分方程模型逆问题,由于评估数据似然函数的计算代价高昂,所得到的贝叶斯后验分布通常在计算上难以处理。为解决此问题,我们考虑使用高斯过程回归构建似然函数的代理模型,从而得到在实践上易于计算的近似后验分布。本文作为高斯过程回归的导论,特别聚焦于为逆问题构建代理模型的场景,并提出了关于训练点合理选择的新见解。我们证明,真实后验分布与近似后验分布之间的误差,可由真实似然函数与近似似然函数之间的误差(以真实后验加权的$L^2$范数度量)来界定,而在此范数下有效控制真实似然与近似似然之间的误差,则建议基于真实后验分布选择高斯过程代理模型中的训练点。