Second-order optimization approaches like the generalized Gauss-Newton method are considered more powerful as they utilize the curvature information of the objective function with preconditioning matrices. Albeit offering tempting theoretical benefits, they are not easily applicable to modern deep learning. The major reason is due to the quadratic memory and cubic time complexity to compute the inverse of the matrix. These requirements are infeasible even with state-of-the-art hardware. In this work, we propose Ginger, an eigendecomposition for the inverse of the generalized Gauss-Newton matrix. Our method enjoys efficient linear memory and time complexity for each iteration. Instead of approximating the conditioning matrix, we directly maintain its inverse to make the approximation more accurate. We provide the convergence result of Ginger for non-convex objectives. Our experiments on different tasks with different model architectures verify the effectiveness of our method. Our code is publicly available.
翻译:二阶优化方法(如广义高斯-牛顿法)因其利用目标函数的曲率信息进行预条件处理而被认为更具优势。尽管这类方法具有诱人的理论优势,但它们难以直接应用于现代深度学习。主要原因是计算矩阵逆需要二次内存和三次时间复杂度,即使使用最先进的硬件也难以满足这些需求。本文提出Ginger方法,一种针对广义高斯-牛顿矩阵逆的特征分解方法。该方法在每次迭代中只需线性内存和时间复杂度。不同于近似预条件矩阵,我们直接维护其逆矩阵以提高近似精度。我们给出了Ginger方法在非凸目标函数下的收敛性结果。在不同任务和不同模型架构上的实验验证了该方法的有效性。我们的代码已公开。