We address the problem of solving strongly convex and smooth minimization problems using stochastic gradient descent (SGD) algorithm with a constant step size. Previous works suggested to combine the Polyak-Ruppert averaging procedure with the Richardson-Romberg extrapolation to reduce the asymptotic bias of SGD at the expense of a mild increase of the variance. We significantly extend previous results by providing an expansion of the mean-squared error of the resulting estimator with respect to the number of iterations $n$. We show that the root mean-squared error can be decomposed into the sum of two terms: a leading one of order $\mathcal{O}(n^{-1/2})$ with explicit dependence on a minimax-optimal asymptotic covariance matrix, and a second-order term of order $\mathcal{O}(n^{-3/4})$, where the power $3/4$ is best known. We also extend this result to the higher-order moment bounds. Our analysis relies on the properties of the SGD iterates viewed as a time-homogeneous Markov chain. In particular, we establish that this chain is geometrically ergodic with respect to a suitably defined weighted Wasserstein semimetric.
翻译:本文研究了使用恒定步长的随机梯度下降(SGD)算法求解强凸且光滑的最小化问题。先前的研究建议将Polyak-Ruppert平均过程与Richardson-Romberg外推相结合,以降低SGD的渐近偏差,代价是方差的轻微增加。我们通过给出所得估计器关于迭代次数$n$的均方误差展开式,显著扩展了先前的结果。我们证明,均方根误差可以分解为两项之和:一项是主导项,阶为$\mathcal{O}(n^{-1/2})$,并显式依赖于一个极小极大最优的渐近协方差矩阵;另一项是二阶项,阶为$\mathcal{O}(n^{-3/4})$,其中指数$3/4$是目前已知的最佳结果。我们还将此结果扩展到高阶矩界。我们的分析依赖于将SGD迭代序列视为一个时间齐次马尔可夫链的性质。特别地,我们证明了该链关于适当定义的加权Wasserstein半度量是几何遍历的。