Adaptive optimization methods are widely recognized as among the most popular approaches for training Deep Neural Networks (DNNs). Techniques such as Adam, AdaGrad, and AdaHessian utilize a preconditioner that modifies the search direction by incorporating information about the curvature of the objective function. However, despite their adaptive characteristics, these methods still require manual fine-tuning of the step-size. This, in turn, impacts the time required to solve a particular problem. This paper presents an optimization framework named SANIA to tackle these challenges. Beyond eliminating the need for manual step-size hyperparameter settings, SANIA incorporates techniques to address poorly scaled or ill-conditioned problems. We also explore several preconditioning methods, including Hutchinson's method, which approximates the Hessian diagonal of the loss function. We conclude with an extensive empirical examination of the proposed techniques across classification tasks, covering both convex and non-convex contexts.
翻译:自适应优化方法被广泛认为是训练深度神经网络(DNNs)最受欢迎的方法之一。Adam、AdaGrad和AdaHessian等技术使用预条件子,通过整合目标函数曲率信息来修改搜索方向。然而,尽管具有自适应特性,这些方法仍需手动微调步长,这进而影响了求解特定问题所需的时间。本文提出了一种名为SANIA的优化框架来解决这些挑战。除了消除手动设置步长超参数的需求外,SANIA还整合了处理不良缩放或病态问题的技术。我们还探索了多种预条件方法,包括Hutchinson方法,该方法用于近似损失函数的Hessian矩阵对角线。最后,我们对所提出的技术在分类任务中进行了广泛的实证研究,涵盖了凸优化和非凸优化场景。