Stochastic Gradient Descent (SGD) is one of the simplest and most popular algorithms in modern statistical and machine learning due to its computational and memory efficiency. Various averaging schemes have been proposed to accelerate the convergence of SGD in different settings. In this paper, we explore a general averaging scheme for SGD. Specifically, we establish the asymptotic normality of a broad range of weighted averaged SGD solutions and provide asymptotically valid online inference approaches. Furthermore, we propose an adaptive averaging scheme that exhibits both optimal statistical rate and favorable non-asymptotic convergence, drawing insights from the optimal weight for the linear model in terms of non-asymptotic mean squared error (MSE).
翻译:随机梯度下降(SGD)是现代统计与机器学习中最简单且最流行的算法之一,因其计算和内存效率而备受青睐。为加速SGD在不同场景下的收敛,学者们提出了多种平均化方案。本文探讨了一种通用的SGD平均化方案:首先,我们建立了广泛加权平均SGD解的渐近正态性,并提供了渐近有效的在线推断方法;其次,我们提出了一种自适应平均化方案,该方案在非渐近均方误差(MSE)意义上借鉴线性模型最优权重的思想,同时实现了最优统计速率与良好的非渐近收敛性能。