Heavy Ball (HB) nowadays is one of the most popular momentum methods in non-convex optimization. It has been widely observed that incorporating the Heavy Ball dynamic in gradient-based methods accelerates the training process of modern machine learning models. However, the progress on establishing its theoretical foundation of acceleration is apparently far behind its empirical success. Existing provable acceleration results are of the quadratic or close-to-quadratic functions, as the current techniques of showing HB's acceleration are limited to the case when the Hessian is fixed. In this work, we develop some new techniques that help show acceleration beyond quadratics, which is achieved by analyzing how the change of the Hessian at two consecutive time points affects the convergence speed. Based on our technical results, a class of Polyak-\L{}ojasiewicz (PL) optimization problems for which provable acceleration can be achieved via HB is identified. Moreover, our analysis demonstrates a benefit of adaptively setting the momentum parameter. (Update: 08/29/2023) Erratum is added in Appendix J. This is an updated version that fixes an issue in the previous version. An additional condition needs to be satisfied for the acceleration result of HB beyond quadratics in this work, which naturally holds when the dimension is one or, more broadly, when the Hessian is diagonal. We elaborate on the issue in Appendix J.
翻译:重球法(HB)是目前非凸优化中最流行的动量方法之一。大量观测表明,在基于梯度的优化方法中引入重球动力学可加速现代机器学习模型的训练过程。然而,其加速理论基础的建立进展明显滞后于实证成功。现有的可证明加速结果局限于二次函数或近似二次函数,这是因为当前证明HB加速的技术仅适用于Hessian矩阵固定的情况。本研究开发了新技术,通过分析Hessian矩阵在连续两个时间点的变化如何影响收敛速度,实现了超越二次函数的加速证明。基于我们的技术结果,识别出一类可通过HB实现可证明加速的Polyak-Łojasiewicz(PL)优化问题。此外,我们的分析展示了自适应设定动量参数的优势。(更新于2023年8月29日)附录J增加了勘误说明。本更新版本修复了前一版本的问题:本文超越二次函数的HB加速结果需满足一个额外条件——该条件在一维情况下自然成立,更广泛地讲,当Hessian矩阵为对角矩阵时也自然成立。附录J详细阐述了该问题。