Recent empirical work has revealed an intriguing property of deep learning models by which the sharpness (largest eigenvalue of the Hessian) increases throughout optimization until it stabilizes around a critical value at which the optimizer operates at the edge of stability, given a fixed stepsize (Cohen et al, 2022). We investigate empirically how the sharpness evolves when using stepsize-tuners, the Armijo linesearch and Polyak stepsizes, that adapt the stepsize along the iterations to local quantities such as, implicitly, the sharpness itself. We find that the surprisingly poor performance of a classical Armijo linesearch in the deterministic setting may be well explained by its tendency to ever-increase the sharpness of the objective. On the other hand, we observe that Polyak stepsizes operate generally at the edge of stability or even slightly beyond, outperforming its Armijo and constant stepsizes counterparts in the deterministic setting. We conclude with an analysis that suggests unlocking stepsize tuners requires an understanding of the joint dynamics of the step size and the sharpness.
翻译:最近的实证研究揭示了深度学习模型的一个有趣性质:在使用固定步长时,优化过程中黑塞矩阵的最大特征值(尖锐度)不断增加,直至稳定在优化器运行于稳定性边缘的临界值附近(Cohen等,2022)。本文通过实证研究了使用步长调节器(Armijo线搜索和Polyak步长)时尖锐度的演化过程,这些调节器会根据局部量(如隐含的尖锐度本身)沿迭代自适应调整步长。我们发现,经典Armijo线搜索在确定性设置下表现异常低效,其可能原因在于该算法倾向于使目标函数的尖锐度持续增加。另一方面,我们观察到Polyak步长通常在稳定性边缘甚至略微超出该范围运行,在确定性设置中优于对应的Armijo步长和固定步长。最后我们通过分析指出,解锁步长调节器的效能需要理解步长与尖锐度的联合动力学机制。