Many machine learning applications and tasks rely on the stochastic gradient descent (SGD) algorithm and its variants. Effective step length selection is crucial for the success of these algorithms, which has motivated the development of algorithms such as ADAM or AdaGrad. In this paper, we propose a novel algorithm for adaptive step length selection in the classical SGD framework, which can be readily adapted to other stochastic algorithms. Our proposed algorithm is inspired by traditional nonlinear optimization techniques and is supported by analytical findings. We show that under reasonable conditions, the algorithm produces step lengths in line with well-established theoretical requirements, and generates iterates that converge to a stationary neighborhood of a solution in expectation. We test the proposed algorithm on logistic regressions and deep neural networks and demonstrate that the algorithm can generate step lengths comparable to the best step length obtained from manual tuning.
翻译:许多机器学习应用和任务依赖于随机梯度下降(SGD)算法及其变体。有效的步长选择对于这些算法的成功至关重要,这促使了ADAM或AdaGrad等算法的发展。在本文中,我们提出了一种在经典SGD框架中进行自适应步长选择的新算法,该算法可轻松适用于其他随机算法。我们的算法受传统非线性优化技术启发,并得到分析结论的支持。我们证明,在合理条件下,该算法产生的步长符合成熟的理论要求,并且产生的迭代序列在期望意义上收敛到解的一个平稳邻域。我们在逻辑回归和深度神经网络上测试了所提出的算法,并证明该算法能够生成与手动调优得到的最佳步长相媲美的步长。