This paper introduces a second-order hyperplane search, a novel optimization step that generalizes a second-order line search from a line to a $k$-dimensional hyperplane. This, combined with the forward-mode stochastic gradient method, yields a second-order optimization algorithm that consists of forward passes only, completely avoiding the storage overhead of backpropagation. Unlike recent work that relies on directional derivatives (or Jacobian--Vector Products, JVPs), we use hyper-dual numbers to jointly evaluate both directional derivatives and their second-order quadratic terms. As a result, we introduce forward-mode weight perturbation with Hessian information (FoMoH). We then use FoMoH to develop a novel generalization of line search by extending it to a hyperplane search. We illustrate the utility of this extension and how it might be used to overcome some of the recent challenges of optimizing machine learning models without backpropagation. Our code is open-sourced at https://github.com/SRI-CSL/fomoh.
翻译:本文提出了一种二阶超平面搜索方法,这是一种新颖的优化步骤,将二阶线搜索从直线推广到$k$维超平面。该方法与前向模式随机梯度法相结合,产生了一种仅包含前向传播的二阶优化算法,完全避免了反向传播的存储开销。与近期依赖方向导数(或雅可比-向量积,JVPs)的研究不同,我们采用超对偶数联合计算方向导数及其二阶二次项。由此,我们提出了具有海森信息的权重扰动前向模式方法。随后,我们利用该方法发展出线搜索的新颖推广形式,将其扩展为超平面搜索。我们阐述了该扩展的实用性,并说明其如何用于克服近期无反向传播机器学习模型优化的若干挑战。代码已在https://github.com/SRI-CSL/fomoh开源。