Tuning algorithms such as stochastic gradient descent (SGD) and stochastic gradient Langevin dynamics (SGLD) for approximate sampling and uncertainty quantification remains challenging, particularly in the practically relevant settings when the batch size is large or the model is misspecified. Existing theory that provides tuning guidance relies on continuous-time limits or strong statistical assumptions, which can become quantitatively inaccurate in these regimes. We address these shortcomings by proposing new discrete-time approximations to SG(L)D with and without momentum, which enables accurate predictions of the stationary covariance, iterate average covariance, and integrated autocorrelation time. Moreover, we prove quantitative, non-asymptotic error bounds showing that these estimates are sufficiently accurate for practical tuning and uncertainty quantification. Numerical experiments demonstrate that our theory yields improved tuning guidance across a range of models and data-generating distributions where existing approaches fail, including when using the $β$-divergence rather than log-loss to obtain statistically robust inferences.
翻译:调优随机梯度下降(SGD)和随机梯度朗之万动力学(SGLD)等算法以实现近似采样和不确定度量化仍具挑战性,特别是在批次规模较大或模型设定错误等实际相关场景中。现有提供调优指导的理论依赖于连续时间极限或强统计假设,在这些条件下其量化精度可能降低。我们通过提出新型离散时间近似方法解决上述局限——该方法支持带/不带动量的随机梯度Langevin动力学——可精确预测平稳协方差、迭代平均协方差及积分自相关时间。此外,我们证明了定量化非渐近误差界,表明这些估计值对实际调优和不确定度量化具有充分精度。数值实验表明,在现有方法失效的多种模型与数据生成分布场景中(包括使用β散度替代对数损失函数以获取统计鲁棒性推断时),我们的理论能提供更优的调优指导。