In neural network training, RMSProp and ADAM remain widely favoured optimization algorithms. One of the keys to their performance lies in selecting the correct step size, which can significantly influence their effectiveness. It is worth noting that these algorithms performance can vary considerably, depending on the chosen step sizes. Additionally, questions about their theoretical convergence properties continue to be a subject of interest. In this paper, we theoretically analyze a constant stepsize version of ADAM in the non-convex setting. We show sufficient conditions for the stepsize to achieve almost sure asymptotic convergence of the gradients to zero with minimal assumptions. We also provide runtime bounds for deterministic ADAM to reach approximate criticality when working with smooth, non-convex functions.
翻译:在神经网络训练中,RMSProp和ADAM仍然是广泛受欢迎的优化算法。其性能的关键之一在于选择合适的步长,这对其有效性具有显著影响。值得注意的是,这些算法的性能可能因所选步长的不同而产生较大差异。此外,关于其理论收敛性质的问题持续引发学界关注。本文从理论层面分析了恒定步长ADAM在非凸场景下的收敛性。我们给出了步长在最小化假设下实现梯度几乎必然渐近收敛到零的充分条件,并针对光滑非凸函数提出了确定性ADAM达到近似临界点的运行时间界。