Research into optimisation for deep learning is characterised by a tension between the computational efficiency of first-order, gradient-based methods (such as SGD and Adam) and the theoretical efficiency of second-order, curvature-based methods (such as quasi-Newton methods and K-FAC). We seek to combine the benefits of both approaches into a single computationally-efficient algorithm. Noting that second-order methods often depend on stabilising heuristics (such as Levenberg-Marquardt damping), we propose AdamQLR: an optimiser combining damping and learning rate selection techniques from K-FAC (Martens and Grosse, 2015) with the update directions proposed by Adam, inspired by considering Adam through a second-order lens. We evaluate AdamQLR on a range of regression and classification tasks at various scales, achieving competitive generalisation performance vs runtime.
翻译:深度学习优化研究始终存在一种张力:一阶梯度方法(如 SGD 和 Adam)的计算效率与二阶曲率方法(如拟牛顿法和 K-FAC)的理论效率之间的平衡。我们致力于将两者的优势融入单一计算高效的算法中。注意到二阶方法通常依赖稳定性启发式策略(如 Levenberg-Marquardt 阻尼),我们提出 AdamQLR 优化器:该算法融合 K-FAC(Martens 和 Grosse, 2015)中的阻尼与学习率选择技术,同时采用 Adam 的更新方向,其灵感源于从二阶视角审视 Adam。我们在不同规模的回归与分类任务上评估 AdamQLR,实现了泛化性能与运行时之间的竞争性平衡。