Despite their popularity in the field of continuous optimisation, second-order quasi-Newton methods are challenging to apply in machine learning, as the Hessian matrix is intractably large. This computational burden is exacerbated by the need to address non-convexity, for instance by modifying the Hessian's eigenvalues as in Saddle-Free Newton methods. We propose an optimisation algorithm which addresses both of these concerns - to our knowledge, the first efficiently-scalable optimisation algorithm to asymptotically use the exact (eigenvalue-modified) inverse Hessian. Our method frames the problem as a series which principally square-roots and inverts the squared Hessian, then uses it to precondition a gradient vector, all without explicitly computing or eigendecomposing the Hessian. A truncation of this infinite series provides a new optimisation algorithm which is scalable and comparable to other first- and second-order optimisation methods in both runtime and optimisation performance. We demonstrate this in a variety of settings, including a ResNet-18 trained on CIFAR-10.
翻译:尽管二阶拟牛顿法在连续优化领域广受欢迎,但由于Hessian矩阵规模过大难以处理,其在机器学习中的应用面临挑战。这种计算负担在需要处理非凸性时进一步加剧——例如鞍点自由牛顿法中通过修改Hessian矩阵特征值的方法。我们提出了一种能兼顾上述两个问题的优化算法——据我们所知,这是首个可高效扩展且渐进使用精确(特征值修正)逆Hessian矩阵的优化算法。该方法将问题构建为以平方根化并求逆Hessian平方为主体的序列,随后利用该序列对梯度向量进行预条件处理,整个过程无需显式计算或特征分解Hessian矩阵。对该无穷级数的截断提供了新型优化算法,该算法在运行时间和优化性能方面均具有可扩展性,且可与一阶及二阶优化方法相媲美。我们在包括CIFAR-10训练的ResNet-18在内的多种场景中验证了其性能。