Stochastic gradient descent (SGD) is the workhorse of large-scale learning, yet classical analyses rely on assumptions that can be either too strong (bounded variance) or too coarse (uniform noise). The expected smoothness (ES) condition has emerged as a flexible alternative that ties the second moment of stochastic gradients to the objective value and the full gradient. This paper presents a self-contained convergence analysis of SGD under ES. We (i) refine ES with interpretations and sampling-dependent constants; (ii) derive bounds of the expectation of squared full gradient norm; and (iii) prove $O(1/K)$ rates with explicit residual errors for various step-size schedules. All proofs are given in full detail in the appendix. Our treatment unifies and extends recent threads (Khaled and Richt\'arik, 2020; Umeda and Iiduka, 2025).
翻译:随机梯度下降(SGD)是大规模学习的核心方法,然而经典分析所依赖的假设要么过强(有界方差),要么过于粗略(均匀噪声)。期望光滑性(ES)条件作为一种灵活的替代方案应运而生,它将随机梯度的二阶矩与目标函数值及全梯度联系起来。本文在ES条件下给出了SGD的自洽收敛性分析。我们(i)通过解释和采样相关常数对ES条件进行了精细化表述;(ii)推导了全梯度范数平方期望的界;(iii)针对不同步长调度方案,证明了具有显式残差误差的$O(1/K)$收敛速率。所有证明细节均在附录中完整给出。我们的处理方式统一并拓展了近期研究脉络(Khaled and Richt\'arik, 2020; Umeda and Iiduka, 2025)。