The stochastic gradient descent method with momentum (SGDM) is a common approach for solving large-scale and stochastic optimization problems. Despite its popularity, the convergence behavior of SGDM remains less understood in nonconvex scenarios. This is primarily due to the absence of a sufficient descent property and challenges in simultaneously controlling the momentum and stochastic errors in an almost sure sense. To address these challenges, we investigate the behavior of SGDM over specific time windows, rather than examining the descent of consecutive iterates as in traditional studies. This time window-based approach simplifies the convergence analysis and enables us to establish the iterate convergence result for SGDM under the {\L}ojasiewicz property. We further provide local convergence rates which depend on the underlying {\L}ojasiewicz exponent and the utilized step size schemes.
翻译:带动量的随机梯度下降法(SGDM)是解决大规模随机优化问题的常用方法。尽管应用广泛,SGDM在非凸场景下的收敛行为仍缺乏充分理解。这主要源于其缺乏充分下降性质,以及在几乎必然意义下同时控制动量项与随机误差存在困难。为应对这些挑战,本研究通过考察特定时间窗口内的算法行为(而非传统研究中关注的连续迭代点下降性质)来分析SGDM。这种基于时间窗口的方法简化了收敛性分析,使我们能够在满足{\L}ojasiewicz性质的条件下建立SGDM的迭代收敛结果。进一步地,我们给出了依赖于底层{\L}ojasiewicz指数及所采用步长方案的局部收敛速率。