Adam-type algorithms have become a preferred choice for optimisation in the deep learning setting, however, despite success, their convergence is still not well understood. To this end, we introduce a unified framework for Adam-type algorithms (called UAdam). This is equipped with a general form of the second-order moment, which makes it possible to include Adam and its variants as special cases, such as NAdam, AMSGrad, AdaBound, AdaFom, and Adan. This is supported by a rigorous convergence analysis of UAdam in the non-convex stochastic setting, showing that UAdam converges to the neighborhood of stationary points with the rate of $\mathcal{O}(1/T)$. Furthermore, the size of neighborhood decreases as $\beta$ increases. Importantly, our analysis only requires the first-order momentum factor to be close enough to 1, without any restrictions on the second-order momentum factor. Theoretical results also show that vanilla Adam can converge by selecting appropriate hyperparameters, which provides a theoretical guarantee for the analysis, applications, and further developments of the whole class of Adam-type algorithms.
翻译:Adam型算法已成为深度学习优化中的首选方法,然而尽管取得了成功,其收敛性仍未得到充分理解。为此,我们提出了一个统一的Adam型算法框架(称为UAdam)。该框架配备了通用形式的二阶矩,能够将Adam及其变体(如NAdam、AMSGrad、AdaBound、AdaFom和Adan)作为特例纳入其中。我们给出了UAdam在非凸随机场景下的严格收敛性分析,证明UAdam以$\mathcal{O}(1/T)$的速率收敛到驻点的邻域。此外,邻域大小随$\beta$增大而减小。重要的是,我们的分析仅要求一阶动量因子足够接近1,而对二阶动量因子无任何限制。理论结果还表明,通过选择合适的超参数,原始Adam能够收敛,这为整个Adam型算法族的分析、应用及进一步发展提供了理论保障。