Despite their popularity in the field of continuous optimisation, second-order quasi-Newton methods are challenging to apply in machine learning, as the Hessian matrix is intractably large. This computational burden is exacerbated by the need to address non-convexity, for instance by modifying the Hessian's eigenvalues as in Saddle-Free Newton methods. We propose an optimisation algorithm which addresses both of these concerns - to our knowledge, the first efficiently-scalable optimisation algorithm to asymptotically use the exact inverse Hessian with absolute-value eigenvalues. Our method frames the problem as a series which principally square-roots and inverts the squared Hessian, then uses it to precondition a gradient vector, all without explicitly computing or eigendecomposing the Hessian. A truncation of this infinite series provides a new optimisation algorithm which is scalable and comparable to other first- and second-order optimisation methods in both runtime and optimisation performance. We demonstrate this in a variety of settings, including a ResNet-18 trained on CIFAR-10.
翻译:尽管在连续优化领域广受欢迎,二阶拟牛顿方法在机器学习中的应用仍面临挑战,因为黑塞矩阵的规模难以处理。这一计算负担因需要处理非凸性而加剧,例如鞍点自由牛顿方法中修改黑塞矩阵特征值的操作。我们提出了一种优化算法,同时解决上述两个问题——据我们所知,这是首个在渐近意义上使用精确逆黑塞矩阵并采用绝对值特征值的高效可扩展优化算法。该方法将问题构建为一系列运算,通过平方根化和求逆平方后的黑塞矩阵,进而将其用于预处理梯度向量,全程无需显式计算或特征分解黑塞矩阵。截断该无穷级数后,我们得到了一种新的优化算法,该算法具有可扩展性,在运行时间和优化性能方面与其他一阶和二阶优化方法相当。我们在多种设置中验证了这一点,包括在CIFAR-10上训练的ResNet-18网络。