The non-asymptotic analysis of Stochastic Gradient Descent (SGD) typically yields bounds that decompose into a bias term and a variance term. In this work, we focus on the bias component and study the extent to which SGD can match the optimal convergence behavior of deterministic gradient descent. Assuming only (strong) convexity and smoothness of the objective, we derive new bounds that are bias-optimal, in the sense that the bias term coincides with the worst-case rate of gradient descent. Our results hold for the full range of constant step-sizes $γL \in (0,2)$, including critical and large step-size regimes that were previously unexplored without additional variance assumptions. The bounds are obtained through the construction of a simple Lyapunov energy whose monotonicity yields sharp convergence guarantees. To design the parameters of this energy, we employ the Performance Estimation Problem framework, which we also use to provide numerical evidence for the optimality of the associated variance terms.
翻译:随机梯度下降(SGD)的非渐近分析通常产生可分解为偏差项和方差项的界。本文聚焦于偏差分量,研究SGD能在多大程度上匹配确定性梯度下降的最优收敛行为。仅假设目标函数具有(强)凸性和光滑性,我们推导出新的偏差最优界,即偏差项与梯度下降的最坏情况速率一致。我们的结果适用于恒定步长$γL \in (0,2)$的整个范围,包括先前在未附加方差假设条件下未曾探索的临界和大步长区域。这些界是通过构造一个简单的李雅普诺夫能量函数获得的,其单调性可产生尖锐的收敛保证。为了设计该能量函数的参数,我们采用了性能估计问题框架,并利用该框架为相关方差项的最优性提供了数值证据。