The Adam optimizer, often used in Machine Learning for neural network training, corresponds to an underlying ordinary differential equation (ODE) in the limit of very small learning rates. This work shows that the classical Adam algorithm is a first order implicit-explicit (IMEX) Euler discretization of the underlying ODE. Employing the time discretization point of view, we propose new extensions of the Adam scheme obtained by using higher order IMEX methods to solve the ODE. Based on this approach, we derive a new optimization algorithm for neural network training that performs better than classical Adam on several regression and classification problems.
翻译:在机器学习中常用于神经网络训练的Adam优化器,在极小学惯率极限下对应一个常微分方程(ODE)。本文证明经典Adam算法是底层ODE的一阶隐式-显式(IMEX)欧拉离散化。基于时间离散化视角,我们提出利用高阶IMEX方法求解ODE来扩展Adam方案的新方法。基于这一方法,我们推导出新的神经网络训练优化算法,其在多个回归与分类问题上的表现优于经典Adam。