This report investigates the fitting of the Hessian or its inverse for stochastic optimizations using a Hessian fitting criterion derived from the preconditioned stochastic gradient descent (PSGD) method. This criterion is closely related to many widely used second-order and adaptive gradient optimization methods, including BFGS, the Gauss-Newton algorithm, natural gradient descent, and AdaGrad. Our analyses reveal the efficiency and reliability differences of a broad range of preconditioner fitting methods, ranging from closed-form to iterative approaches, using Hessian-vector products or stochastic gradients only, with Hessian fittings across various geometric settings (the Euclidean space, the manifold of symmetric positive definite (SPD) matrices, and a variety of Lie groups). The most intriguing finding is that the Hessian fitting problem is strongly convex under mild conditions in certain general Lie groups. This result turns the Hessian fitting into a well-behaved Lie group optimization problem and facilitates the design of highly efficient and elegant Lie group sparse preconditioner fitting methods for large-scale stochastic optimizations.
翻译:本报告研究了利用从预条件随机梯度下降(PSGD)方法导出的Hessian拟合准则,为随机优化拟合Hessian矩阵或其逆矩阵的问题。该准则与许多广泛使用的二阶和自适应梯度优化方法密切相关,包括BFGS、高斯-牛顿算法、自然梯度下降以及AdaGrad。我们的分析揭示了从闭式解到迭代方法、仅使用Hessian-向量积或随机梯度、在不同几何设置(欧几里得空间、对称正定(SPD)矩阵流形以及各类李群)下进行Hessian拟合的广泛预条件子拟合方法在效率与可靠性上的差异。最引人注目的发现是,在特定一般李群中,Hessian拟合问题在温和条件下具有强凸性。这一结果将Hessian拟合转化为一个性质良好的李群优化问题,并为大规模随机优化中高效且优雅的李群稀疏预条件子拟合方法的设计提供了便利。