Recently, many machine learning optimizers have been analysed considering them as the asymptotic limit of some differential equations when the step size goes to zero. In other words, the optimizers can be seen as a finite difference scheme applied to a continuous dynamical system. But the major part of the results in the literature concerns constant step size algorithms. The main aim of this paper is to investigate the guarantees of the adaptive step size counterpart. In fact, this dynamical point of view can be used to design step size update rules, by choosing a discretization of the continuous equation that preserves its most relevant features. In this work, we analyse this kind of adaptive optimizers and prove their Lyapunov stability and convergence properties for any choice of hyperparameters. At the best of our knowledge, this paper introduces for the first time the use of continuous selection theory from general topology to overcome some of the intrinsic difficulties due to the non constant and non regular step size policies. The general framework developed gives many new results on adaptive and constant step size Momentum/Heavy-Ball and p-GD algorithms.
翻译:近年来,许多机器学习优化器被分析为当步长趋近于零时某些微分方程的渐近极限。换言之,优化器可视为应用于连续动力系统的有限差分格式。但文献中的大部分结果关注恒定步长算法。本文的主要目标是研究自适应步长对应算法的收敛保证。实际上,这种动力学视角可用于设计步长更新规则,即通过选择能保持连续方程最相关特征的离散化方案。本工作中,我们分析了此类自适应优化器,并证明了其在任意超参数选择下的Lyapunov稳定性与收敛特性。据我们所知,本文首次引入一般拓扑学中的连续选择理论,以克服由非常量且非规则步长策略引起的固有困难。所构建的通用框架为自适应及恒定步长的Momentum/Heavy-Ball算法和p-GD算法提供了诸多新结论。