Despite the success of the Adam optimizer in practice, the theoretical understanding of its algorithmic components still remains limited. In particular, most existing analyses of Adam show the convergence rate that can be simply achieved by non-adative algorithms like SGD. In this work, we provide a different perspective based on online learning that underscores the importance of Adam's algorithmic components. Inspired by Cutkosky et al. (2023), we consider the framework called online learning of updates, where we choose the updates of an optimizer based on an online learner. With this framework, the design of a good optimizer is reduced to the design of a good online learner. Our main observation is that Adam corresponds to a principled online learning framework called Follow-the-Regularized-Leader (FTRL). Building on this observation, we study the benefits of its algorithmic components from the online learning perspective.
翻译:尽管Adam优化器在实践中取得了成功,但其算法组件的理论理解仍然有限。特别是,现有对Adam的大部分分析仅展示了非自适应算法(如SGD)也能简单实现的收敛速率。在本工作中,我们基于在线学习提供了不同视角,强调了Adam算法组件的重要性。受Cutkosky等人(2023)启发,我们考虑一种名为"更新在线学习"的框架,即通过在线学习器选择优化器的更新步骤。在此框架下,设计优秀优化器的问题简化为设计优秀在线学习器的问题。我们的主要发现是:Adam对应于一种名为"跟随正则化领导者"(FTRL)的规范在线学习框架。基于这一发现,我们从在线学习角度研究了其算法组件的优势。