The tuning of stochastic gradient algorithms (SGAs) for optimization and sampling is often based on heuristics and trial-and-error rather than generalizable theory. We address this theory--practice gap by characterizing the large-sample statistical asymptotics of SGAs via a joint step-size--sample-size scaling limit. We show that iterate averaging with a large fixed step size is robust to the choice of tuning parameters and asymptotically has covariance proportional to that of the MLE sampling distribution. We also prove a Bernstein--von Mises-like theorem to guide tuning, including for generalized posteriors that are robust to model misspecification. Numerical experiments validate our results and recommendations in realistic finite-sample regimes. Our work lays the foundation for a systematic analysis of other stochastic gradient Markov chain Monte Carlo algorithms for a wide range of models.
翻译:随机梯度算法(SGAs)在优化与采样中的调参通常依赖启发式规则和反复试验,缺乏可推广的理论支撑。本文通过联合步长-样本量缩放极限刻画SGAs的大样本统计渐近行为,弥合了这一理论与实践的鸿沟。我们证明:采用较大固定步长的迭代平均方法对调参选择具有稳健性,且其渐近协方差与最大似然估计(MLE)采样分布的协方差成比例。此外,我们提出类似伯恩斯坦-冯·米塞斯定理的指导性调参准则,该准则适用于对模型误设具有鲁棒性的广义后验分布。数值实验在现实有限样本场景下验证了我们的结论与建议。本研究为系统性分析多种模型下的其他随机梯度马尔可夫链蒙特卡洛算法奠定了理论基础。