Bayesian inference allows to obtain useful information on the parameters of models, either in computational statistics or more recently in the context of Bayesian Neural Networks. The computational cost of usual Monte Carlo methods for sampling posterior laws in Bayesian inference scales linearly with the number of data points. One option to reduce it to a fraction of this cost is to resort to mini-batching in conjunction with unadjusted discretizations of Langevin dynamics, in which case only a random fraction of the data is used to estimate the gradient. However, this leads to an additional noise in the dynamics and hence a bias on the invariant measure which is sampled by the Markov chain. We advocate using the so-called Adaptive Langevin dynamics, which is a modification of standard inertial Langevin dynamics with a dynamical friction which automatically corrects for the increased noise arising from mini-batching. We investigate the practical relevance of the assumptions underpinning Adaptive Langevin (constant covariance for the estimation of the gradient, Gaussian minibatching noise), which are not satisfied in typical models of Bayesian inference, and quantify the bias induced by minibatching in this case. We also suggest a possible extension of AdL to further reduce the bias on the posterior distribution, by considering a dynamical friction depending on the current value of the parameter to sample.
翻译:贝叶斯推断能在计算统计学或近期贝叶斯神经网络背景下,获取模型参数的有用信息。传统蒙特卡洛方法在贝叶斯推断中采样后验分布的计算成本与数据点数量呈线性关系。为将该成本降至极小比例,可结合迷你批次技术与未校正的朗之万动力学离散化方案——此时仅使用数据的随机子集估计梯度。然而,这会在动力学中引入额外噪声,导致马尔可夫链采样的不变测度产生偏差。我们主张采用自适应朗之万动力学,该方法是标准惯性朗之万动力学的改进版,通过动态摩擦自动校正迷你批次带来的噪声增量。我们研究了支撑自适应朗之万(梯度估计的恒定协方差、高斯迷你批次噪声)的假设在典型贝叶斯推断模型中的实际适用性(这些假设通常在模型中不成立),并量化了此情形下迷你批次引发的偏差。我们还提出自适应朗之万的可能扩展方案,通过考虑取决于待采样参数当前值的动态摩擦,进一步降低后验分布的偏差。