This work presents an efficient approach for accelerating multilevel Markov Chain Monte Carlo (MCMC) sampling for large-scale problems using low-fidelity machine learning models. While conventional techniques for large-scale Bayesian inference often substitute computationally expensive high-fidelity models with machine learning models, thereby introducing approximation errors, our approach offers a computationally efficient alternative by augmenting high-fidelity models with low-fidelity ones within a hierarchical framework. The multilevel approach utilizes the low-fidelity machine learning model (MLM) for inexpensive evaluation of proposed samples thereby improving the acceptance of samples by the high-fidelity model. The hierarchy in our multilevel algorithm is derived from geometric multigrid hierarchy. We utilize an MLM to acclerate the coarse level sampling. Training machine learning model for the coarsest level significantly reduces the computational cost associated with generating training data and training the model. We present an MCMC algorithm to accelerate the coarsest level sampling using MLM and account for the approximation error introduced. We provide theoretical proofs of detailed balance and demonstrate that our multilevel approach constitutes a consistent MCMC algorithm. Additionally, we derive conditions on the accuracy of the machine learning model to facilitate more efficient hierarchical sampling. Our technique is demonstrated on a standard benchmark inference problem in groundwater flow, where we estimate the probability density of a quantity of interest using a four-level MCMC algorithm. Our proposed algorithm accelerates multilevel sampling by a factor of two while achieving similar accuracy compared to sampling using the standard multilevel algorithm.
翻译:本文提出了一种高效方法,通过使用低保真度机器学习模型加速大规模问题中的多级马尔可夫链蒙特卡洛采样。传统的大规模贝叶斯推断技术通常用机器学习模型替代计算成本高昂的高保真度模型,从而引入近似误差;而本文方法则通过在高保真度模型与低保真度模型之间构建层次化框架,提供了一种计算高效的替代方案。该多级方法利用低保真度机器学习模型对提议样本进行低成本评估,从而提高高保真度模型对样本的接受率。本文多级算法中的层次结构源自几何多重网格层次。我们采用机器学习模型加速粗层级采样。为最粗层级训练机器学习模型可显著降低生成训练数据及训练模型的计算成本。我们提出了一种马尔可夫链蒙特卡洛算法,利用机器学习模型加速最粗层级采样,并处理引入的近似误差。我们提供了细致平衡的理论证明,并展示了多级算法构成了一致性的马尔可夫链蒙特卡洛方法。此外,我们推导了机器学习模型精度的条件,以促进更高效的层次化采样。该技术在地下水流动的标准基准推断问题中得到验证,我们使用四层马尔可夫链蒙特卡洛算法估计了感兴趣量的概率密度。与标准多级算法相比,本文算法在实现相近精度的同时,将多级采样速度提升了一倍。