This paper studies the fitting of Hessian or its inverse with stochastic Hessian-vector products. A Hessian fitting criterion, which can be used to derive most of the commonly used methods, e.g., BFGS, Gaussian-Newton, AdaGrad, etc., is used for the analysis. Our studies reveal different convergence rates for different Hessian fitting methods, e.g., sublinear rates for gradient descent in the Euclidean space and a commonly used closed-form solution, linear rates for gradient descent on the manifold of symmetric positive definite (SPL) matrices and certain Lie groups. The Hessian fitting problem is further shown to be strongly convex under mild conditions on a specific yet general enough Lie group. To confirm our analysis, these methods are tested under different settings like noisy Hessian-vector products, time varying Hessians, and low precision arithmetic. These findings are useful for stochastic second order optimizations that rely on fast, robust and accurate Hessian estimations.
翻译:本文研究利用随机Hessian-向量积拟合Hessian矩阵或其逆矩阵的问题。我们采用一种可推导出最常用方法(如BFGS、高斯-牛顿法、AdaGrad等)的Hessian拟合准则进行分析。研究表明,不同Hessian拟合方法具有不同的收敛速度:欧氏空间中的梯度下降法及常用闭式解呈次线性收敛速度,而对称正定流形及特定李群上的梯度下降法呈线性收敛速度。进一步证明,在特定且具有足够一般性的李群上,Hessian拟合问题在温和条件下具有强凸性。为验证理论分析,我们在含噪Hessian-向量积、时变Hessian矩阵及低精度运算等不同场景下测试了这些方法。这些发现对依赖快速、鲁棒且精确的Hessian估计的随机二阶优化方法具有实用价值。