This paper studies the fitting of Hessian or its inverse for stochastic optimizations using a Hessian fitting criterion from the preconditioned stochastic gradient descent (PSGD) method, which is intimately related to many commonly used second order and adaptive gradient optimizers, e.g., BFGS, Gaussian-Newton and natural gradient descent, AdaGrad, etc. Our analyses reveal the efficiency and reliability differences among a wide range of preconditioner fitting methods, from closed-form to iterative solutions, using Hessian-vector products or stochastic gradients only, with Hessian fittings in the Euclidean space, the manifold of symmetric positive definite (SPL) matrices, or a variety of Lie groups. The most intriguing discovery is that the Hessian fitting itself as an optimization problem is strongly convex under mild conditions on a specific yet general enough Lie group. This discovery turns Hessian fitting into a well behaved optimization problem, and facilitates the designs of highly efficient and elegant Lie group sparse preconditioner fitting methods for large scale stochastic optimizations.
翻译:本文研究基于预条件随机梯度下降(PSGD)方法中的海森矩阵拟合准则,对海森矩阵或其逆矩阵进行拟合的随机优化问题。该准则与许多常用二阶优化器和自适应梯度优化器(如BFGS、高斯-牛顿法、自然梯度下降法、AdaGrad等)密切相关。我们的分析揭示了从闭式解到迭代解的广泛预条件子拟合方法在效率与可靠性上的差异——这些方法可基于海森-向量乘积或仅利用随机梯度,并在欧氏空间、对称正定(SPL)流形或多种李群中进行海森矩阵拟合。最引人注目的发现是:在特定但足够通用的李群上,海森矩阵拟合本身作为优化问题在温和条件下具有强凸性。这一发现将海森矩阵拟合转化为良态优化问题,并促进了面向大规模随机优化的高效且优雅的李群稀疏预条件子拟合方法的设计。