This report studies the fitting of Hessian or its inverse for stochastic optimizations using a Hessian fitting criterion from the preconditioned stochastic gradient descent (PSGD) method, which is intimately related to many commonly used second-order and adaptive gradient optimizers, e.g., BFGS, Gaussian-Newton algorithm, natural gradient descent, AdaGrad, etc. Our analyses reveal the efficiency and reliability differences among a wide range of preconditioner fitting methods, from closed-form to iterative solutions, using Hessian-vector products or stochastic gradients only, with Hessian fittings in the Euclidean space, the manifold of symmetric positive definite (SPL) matrices, to a variety of Lie groups. The most intriguing discovery is that the Hessian fitting itself as an optimization problem is strongly convex under mild conditions in certain general Lie groups. This discovery turns Hessian fitting into a well-behaved Lie group optimization problem and facilitates the designs of highly efficient and elegant Lie group sparse preconditioner fitting methods for large-scale stochastic optimizations.
翻译:本报告研究了使用预条件随机梯度下降(PSGD)方法中的Hessian拟合准则来拟合Hessian或其逆矩阵以用于随机优化,该准则与许多常用的二阶和自适应梯度优化器密切相关,例如BFGS、高斯-牛顿算法、自然梯度下降、AdaGrad等。我们的分析揭示了从闭式解到迭代解、仅使用Hessian-向量积或随机梯度、在欧几里得空间、对称正定(SPD)矩阵流形到各种李群中进行Hessian拟合的多种预条件子拟合方法在效率和可靠性上的差异。最引人注目的发现是:在特定一般李群中,在温和条件下,Hessian拟合本身作为一个优化问题是强凸的。这一发现将Hessian拟合转化为一个性质良好的李群优化问题,并为大规模随机优化中高效优雅的李群稀疏预条件子拟合方法的设计提供了便利。