Despite the success of the Adam optimizer in practice, the theoretical understanding of its algorithmic components still remains limited. In particular, most existing analyses of Adam show the convergence rate that can be simply achieved by non-adative algorithms like SGD. In this work, we provide a different perspective based on online learning that underscores the importance of Adam's algorithmic components. Inspired by Cutkosky et al. (2023), we consider the framework called online learning of updates/increments, where we choose the updates/increments of an optimizer based on an online learner. With this framework, the design of a good optimizer is reduced to the design of a good online learner. Our main observation is that Adam corresponds to a principled online learning framework called Follow-the-Regularized-Leader (FTRL). Building on this observation, we study the benefits of its algorithmic components from the online learning perspective.
翻译:尽管Adam优化器在实践中取得了成功,但其算法组件的理论理解仍然有限。特别是,现有的大多数Adam分析所展示的收敛速率,实际上可以通过非自适应算法(如SGD)简单实现。在本工作中,我们基于在线学习提供了一个不同的视角,以强调Adam算法组件的重要性。受Cutkosky等人(2023)的启发,我们采用了一种称为增量在线学习的框架,其中优化器的更新/增量基于一个在线学习器进行选择。在此框架下,设计一个优秀的优化器被简化为设计一个优秀的在线学习器。我们的主要发现是:Adam对应于一种称为跟随正则化领导者(FTRL)的原则性在线学习框架。基于这一观察,我们从在线学习的角度研究了其算法组件的优势。