We present Exact Gauss-Newton (EGN), a stochastic second-order optimization algorithm that combines the generalized Gauss-Newton (GN) Hessian approximation with low-rank linear algebra to compute the descent direction. Leveraging the Duncan-Guttman matrix identity, the parameter update is obtained by factorizing a matrix which has the size of the mini-batch. This is particularly advantageous for large-scale machine learning problems where the dimension of the neural network parameter vector is several orders of magnitude larger than the batch size. Additionally, we show how improvements such as line search, adaptive regularization, and momentum can be seamlessly added to EGN to further accelerate the algorithm. Moreover, under mild assumptions, we prove that our algorithm converges in expectation to a stationary point of the objective. Finally, our numerical experiments demonstrate that EGN consistently exceeds, or at most matches the generalization performance of well-tuned SGD, Adam, GAF, SQN, and SGN optimizers across various supervised and reinforcement learning tasks.
翻译:本文提出精确高斯-牛顿(EGN)算法,这是一种随机二阶优化算法,它将广义高斯-牛顿(GN)海森矩阵近似与低秩线性代数相结合来计算下降方向。利用邓肯-古特曼矩阵恒等式,参数更新通过对一个尺寸为小批量大小的矩阵进行分解而获得。这对于大规模机器学习问题尤为有利,因为神经网络参数向量的维度通常比批量大小高出数个数量级。此外,我们展示了如何将线性搜索、自适应正则化和动量等改进措施无缝集成到EGN中,以进一步加速算法。进一步地,在温和的假设条件下,我们证明了该算法在期望意义上收敛到目标函数的驻点。最后,我们的数值实验表明,在各种监督学习和强化学习任务中,EGN始终优于或至少匹配经过良好调优的SGD、Adam、GAF、SQN和SGN优化器的泛化性能。