Recently, Arjevani et al. [1] established a lower bound of iteration complexity for the first-order optimization under an $L$-smooth condition and a bounded noise variance assumption. However, a thorough review of existing literature on Adam's convergence reveals a noticeable gap: none of them meet the above lower bound. In this paper, we close the gap by deriving a new convergence guarantee of Adam, with only an $L$-smooth condition and a bounded noise variance assumption. Our results remain valid across a broad spectrum of hyperparameters. Especially with properly chosen hyperparameters, we derive an upper bound of the iteration complexity of Adam and show that it meets the lower bound for first-order optimizers. To the best of our knowledge, this is the first to establish such a tight upper bound for Adam's convergence. Our proof utilizes novel techniques to handle the entanglement between momentum and adaptive learning rate and to convert the first-order term in the Descent Lemma to the gradient norm, which may be of independent interest.
翻译:最近,Arjevani等人[1]在$L$-光滑条件和有界噪声方差假设下,建立了一阶优化的迭代复杂度下界。然而,对现有Adam收敛性文献的全面回顾揭示了明显的差距:目前尚无工作能达到上述下界。本文通过推导Adam的新收敛保证来填补这一差距,仅需$L$-光滑条件和有界噪声方差假设。我们的结果在广泛的超参数范围内均成立。特别是,当超参数选择适当时,我们推导出Adam的迭代复杂度上界,并证明该上界与一阶优化器的下界相匹配。据我们所知,这是首次为Adam的收敛性建立如此紧的上界。我们的证明采用了新颖的技术,以处理动量与自适应学习率之间的耦合,并将下降引理中的一阶项转化为梯度范数,这些技术可能具有独立的研究价值。