The Stochastic Gradient Langevin Dynamics (SGLD) are popularly used to approximate Bayesian posterior distributions in statistical learning procedures with large-scale data. As opposed to many usual Markov chain Monte Carlo (MCMC) algorithms, SGLD is not stationary with respect to the posterior distribution; two sources of error appear: The first error is introduced by an Euler--Maruyama discretisation of a Langevin diffusion process, the second error comes from the data subsampling that enables its use in large-scale data settings. In this work, we consider an idealised version of SGLD to analyse the method's pure subsampling error that we then see as a best-case error for diffusion-based subsampling MCMC methods. Indeed, we introduce and study the Stochastic Gradient Langevin Diffusion (SGLDiff), a continuous-time Markov process that follows the Langevin diffusion corresponding to a data subset and switches this data subset after exponential waiting times. There, we show the exponential ergodicity of SLGDiff and that the Wasserstein distance between the posterior and the limiting distribution of SGLDiff is bounded above by a fractional power of the mean waiting time. We bring our results into context with other analyses of SGLD.
翻译:随机梯度Langevin动力学(SGLD)被广泛用于大规模数据统计学习过程中逼近贝叶斯后验分布。与许多常规的马尔可夫链蒙特卡洛(MCMC)算法不同,SGLD对后验分布并不具有平稳性;其误差来源于两个方面:第一个误差源于Langevin扩散过程的Euler-Maruyama离散化,第二个误差则来自为适应大规模数据场景而采用的数据子采样。本文考虑一种理想化的SGLD版本以分析方法纯子采样误差,并将其视为基于扩散的子采样MCMC方法的最优情况误差。具体而言,我们引入并研究了随机梯度Langevin扩散(SGLDiff),这是一种连续时间马尔可夫过程,其遵循与数据子集对应的Langevin扩散,并在指数等待时间后切换该数据子集。我们证明了SGLDiff具有指数遍历性,并且后验分布与SGLDiff极限分布之间的Wasserstein距离受到平均等待时间分数次幂的约束。我们将所得结果与SGLD的其他分析进行了比较。