Stochastic gradient MCMC (SGMCMC) offers a scalable alternative to traditional MCMC, by constructing an unbiased estimate of the gradient of the log-posterior with a small, uniformly-weighted subsample of the data. While efficient to compute, the resulting gradient estimator may exhibit a high variance and impact sampler performance. The problem of variance control has been traditionally addressed by constructing a better stochastic gradient estimator, often using control variates. We propose to use a discrete, non-uniform probability distribution to preferentially subsample data points that have a greater impact on the stochastic gradient. In addition, we present a method of adaptively adjusting the subsample size at each iteration of the algorithm, so that we increase the subsample size in areas of the sample space where the gradient is harder to estimate. We demonstrate that such an approach can maintain the same level of accuracy while substantially reducing the average subsample size that is used.
翻译:随机梯度MCMC(SGMCMC)通过使用数据的均匀加权小子样本构建对数后验梯度的无偏估计,提供了传统MCMC的可扩展替代方案。虽然计算高效,但由此产生的梯度估计器可能表现出高方差,影响采样器性能。方差控制问题传统上通过构建更好的随机梯度估计器来解决,通常使用控制变量。我们提出使用离散的非均匀概率分布来优先子抽样那些对随机梯度影响更大的数据点。此外,我们提出一种方法,在算法每次迭代时自适应调整子样本大小,从而在梯度更难估计的样本空间区域增加子样本大小。我们证明,这种方法可以在保持相同精度水平的同时,显著减少所使用的平均子样本大小。