We study the problem of solving strongly convex and smooth unconstrained optimization problems using stochastic first-order algorithms. We devise a novel algorithm, referred to as \emph{Recursive One-Over-T SGD} (\textsf{ROOT-SGD}), based on an easily implementable, recursive averaging of past stochastic gradients. We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an asymptotic sense. On the nonasymptotic side, we prove risk bounds on the last iterate of \textsf{ROOT-SGD} with leading-order terms that match the optimal statistical risk with a unity pre-factor, along with a higher-order term that scales at the sharp rate of $O(n^{-3/2})$ under the Lipschitz condition on the Hessian matrix. On the asymptotic side, we show that when a mild, one-point Hessian continuity condition is imposed, the rescaled last iterate of (multi-epoch) \textsf{ROOT-SGD} converges asymptotically to a Gaussian limit with the Cram\'{e}r-Rao optimal asymptotic covariance, for a broad range of step-size choices.
翻译:我们研究了使用随机一阶算法求解强凸且光滑的无约束优化问题。我们设计了一种新颖的算法,称为 \emph{递归一阶过T随机梯度下降法} (\textsf{ROOT-SGD}),其基础是对过往随机梯度进行易于实现的递归平均。我们证明该算法同时在有限样本非渐近意义和渐近意义上达到了最先进的性能。在非渐近方面,我们证明了 \textsf{ROOT-SGD} 最后迭代的风险界,其主阶项以系数1匹配了最优统计风险,并且在 Hessian 矩阵满足 Lipschitz 条件的假设下,其次阶项以尖锐的 $O(n^{-3/2})$ 速率衰减。在渐近方面,我们证明当施加一个温和的单点 Hessian 连续性条件时,对于广泛的步长选择范围,(多轮)\textsf{ROOT-SGD} 的缩放后最后迭代会渐近收敛到一个具有 Cram\'{e}r-Rao 最优渐近协方差的高斯极限。