We describe an exact algorithm for solving linear systems $Hx=b$ where $H$ is the Hessian of a deep net. The method computes Hessian-inverse-vector products without storing the Hessian or its inverse in time and storage that scale linearly in the number of layers. Compared to the naive approach of first computing the Hessian, then solving the linear system, which takes storage that's quadratic in the number of parameters and cubically many operations, our Hessian-inverse-vector product method scales roughly like Pearlmutter's algorithm for computing Hessian-vector products.
翻译:我们提出了一种精确算法,用于求解线性方程组$Hx=b$,其中$H$为深度网络的Hessian矩阵。该方法可在时间和存储空间随网络层数线性增长的条件下,在不存储Hessian矩阵或其逆矩阵的情况下计算Hessian逆向量乘积。相较于先计算Hessian矩阵再求解线性系统的朴素方法(其存储需求随参数数量呈二次方增长,计算操作量呈三次方增长),我们的Hessian逆向量乘积方法在计算复杂度上近似遵循Pearlmutter计算Hessian向量积算法的扩展规律。