Markov chain Monte Carlo (MCMC) provides asymptotically consistent estimates of intractable posterior expectations as the number of iterations tends to infinity. However, in large data applications, MCMC can be computationally expensive per iteration. This has catalyzed interest in approximating MCMC in a manner that improves computational speed per iteration but does not produce asymptotically consistent estimates. In this article, we propose estimators based on couplings of Markov chains to assess the quality of such asymptotically biased sampling methods. The estimators give empirical upper bounds of the Wasserstein distance between the limiting distribution of the asymptotically biased sampling method and the original target distribution of interest. We establish theoretical guarantees for our upper bounds and show that our estimators can remain effective in high dimensions. We apply our quality measures to stochastic gradient MCMC, variational Bayes, and Laplace approximations for tall data and to approximate MCMC for Bayesian logistic regression in 4500 dimensions and Bayesian linear regression in 50000 dimensions.
翻译:马尔可夫链蒙特卡洛(MCMC)方法在迭代次数趋于无穷大时,能对难以计算的後验期望给出渐近一致的估计。然而,在大数据应用中,每次迭代的MCMC计算成本可能很高。这促使人们关注如何近似MCMC以提升每次迭代的计算速度,但此类近似方法不会产生渐近一致的估计。本文基于马尔可夫链的耦合,提出评估这类渐近有偏采样方法质量的估计量。这些估计量能够给出渐近有偏采样方法的极限分布与原目标分布之间Wasserstein距离的经验上界。我们为所提出的上界建立了理论保证,并证明这些估计量在高维场景下依然有效。我们将所提出的质量度量方法应用于随机梯度MCMC、变分贝叶斯以及针对高维数据的拉普拉斯近似,并进一步应用于4500维贝叶斯逻辑回归和50000维贝叶斯线性回归中的近似MCMC方法。