We present a theoretical analysis of stochastic optimization methods in terms of their sensitivity with respect to the step size. We identify a key quantity that, for each method, describes how the performance degrades as the step size becomes too large. For convex problems, we show that this quantity directly impacts the suboptimality bound of the method. Most importantly, our analysis provides direct theoretical evidence that adaptive step-size methods, such as SPS or NGN, are more robust than SGD. This allows us to quantify the advantage of these adaptive methods beyond empirical evaluation. Finally, we show through experiments that our theoretical bound qualitatively mirrors the actual performance as a function of the step size, even for nonconvex problems.
翻译:本文从步长敏感性的角度对随机优化方法进行了理论分析。我们识别出一个关键量,该量能够描述每种方法在步度过大时性能下降的程度。对于凸优化问题,我们证明该量直接影响方法的次优性界。最重要的是,我们的分析为自适应步长方法(如SPS或NGN)比随机梯度下降法(SGD)更具鲁棒性提供了直接的理论证据。这使得我们能够超越经验评估,量化这些自适应方法的优势。最后,我们通过实验证明,即使对于非凸问题,我们的理论界也能在性质上反映实际性能随步长变化的规律。