Stochastic gradient MCMC (SGMCMC) offers a scalable alternative to traditional MCMC, by constructing an unbiased estimate of the gradient of the log-posterior with a small, uniformly-weighted subsample of the data. While efficient to compute, the resulting gradient estimator may exhibit a high variance and impact sampler performance. The problem of variance control has been traditionally addressed by constructing a better stochastic gradient estimator, often using control variates. We propose to use a discrete, non-uniform probability distribution to preferentially subsample data points that have a greater impact on the stochastic gradient. In addition, we present a method of adaptively adjusting the subsample size at each iteration of the algorithm, so that we increase the subsample size in areas of the sample space where the gradient is harder to estimate. We demonstrate that such an approach can maintain the same level of accuracy while substantially reducing the average subsample size that is used.
翻译:随机梯度MCMC(SGMCMC)通过使用均匀加权的小规模数据子样本构建对数后验梯度的无偏估计,提供了传统MCMC的可扩展替代方案。尽管计算高效,但由此产生的梯度估计器可能具有高方差并影响采样器性能。方差控制问题传统上通过构建更好的随机梯度估计器(通常使用控制变量)来解决。我们提出使用离散、非均匀的概率分布,优先子采样对随机梯度影响更大的数据点。此外,我们提出了一种在算法每次迭代中自适应调整子样本大小的方法,从而在梯度更难估计的样本空间区域增加子样本量。我们证明,这种方法可以在保持相同精度的同时,显著减少平均使用的子样本大小。