The convergence behavior of Stochastic Gradient Descent (SGD) crucially depends on the stepsize configuration. When using a constant stepsize, the SGD iterates form a Markov chain, enjoying fast convergence during the initial transient phase. However, when reaching stationarity, the iterates oscillate around the optimum without making further progress. In this paper, we study the convergence diagnostics for SGD with constant stepsize, aiming to develop an effective dynamic stepsize scheme. We propose a novel coupling-based convergence diagnostic procedure, which monitors the distance of two coupled SGD iterates for stationarity detection. Our diagnostic statistic is simple and is shown to track the transition from transience stationarity theoretically. We conduct extensive numerical experiments and compare our method against various existing approaches. Our proposed coupling-based stepsize scheme is observed to achieve superior performance across a diverse set of convex and non-convex problems. Moreover, our results demonstrate the robustness of our approach to a wide range of hyperparameters.
翻译:随机梯度下降(SGD)的收敛行为关键取决于步长配置。当采用恒定步长时,SGD迭代序列构成一个马尔可夫链,在初始瞬态阶段可实现快速收敛。然而,当达到平稳状态时,迭代值会在最优解附近振荡而不再取得进一步进展。本文研究了恒定步长SGD的收敛诊断方法,旨在开发一种有效的动态步长方案。我们提出了一种新颖的基于耦合的收敛诊断流程,通过监测两个耦合SGD迭代序列之间的距离来检测平稳性。我们的诊断统计量设计简洁,并从理论上证明其能够追踪从瞬态到平稳态的转变过程。我们进行了广泛的数值实验,并将所提方法与多种现有方法进行了比较。实验结果表明,我们提出的基于耦合的步长方案在各类凸优化与非凸优化问题上均表现出优越性能。此外,我们的结果证明了该方法对广泛超参数配置具有鲁棒性。