In recent years, implicit deep learning has emerged as a method to increase the effective depth of deep neural networks. While their training is memory-efficient, they are still significantly slower to train than their explicit counterparts. In Deep Equilibrium Models (DEQs), the training is performed as a bi-level problem, and its computational complexity is partially driven by the iterative inversion of a huge Jacobian matrix. In this paper, we propose a novel strategy to tackle this computational bottleneck from which many bi-level problems suffer. The main idea is to use the quasi-Newton matrices from the forward pass to efficiently approximate the inverse Jacobian matrix in the direction needed for the gradient computation. We provide a theorem that motivates using our method with the original forward algorithms. In addition, by modifying these forward algorithms, we further provide theoretical guarantees that our method asymptotically estimates the true implicit gradient. We empirically study this approach and the recent Jacobian-Free method in different settings, ranging from hyperparameter optimization to large Multiscale DEQs (MDEQs) applied to CIFAR and ImageNet. Both methods reduce significantly the computational cost of the backward pass. While SHINE has a clear advantage on hyperparameter optimization problems, both methods attain similar computational performances for larger scale problems such as MDEQs at the cost of a limited performance drop compared to the original models.
翻译:近年来,隐式深度学习作为一种提升深度神经网络有效深度的方法而兴起。尽管其训练过程具有内存高效性,但训练速度仍显著慢于显式模型。在深度均衡模型(DEQs)中,训练被表述为双层优化问题,其计算复杂度部分源于对庞大雅可比矩阵的迭代求逆。本文提出了一种新颖策略,用于解决许多双层问题中普遍存在的计算瓶颈。核心思想是利用前向传播中的拟牛顿矩阵,在梯度计算所需的方向上高效近似逆雅可比矩阵。我们通过定理论证了该方法与原始前向算法的兼容性。此外,通过修改这些前向算法,我们进一步提供了理论保证,证明该方法能够渐近估计真实隐式梯度。我们在不同场景下(从超参数优化到应用于CIFAR和ImageNet的多尺度深度均衡模型(MDEQs))对该方法及近期提出的无雅可比方法进行了实证研究。两种方法均显著降低了反向传播的计算成本。在超参数优化问题上,SHINE具有明显优势;而在MDEQs等大规模问题中,两种方法达到了相似的计算性能,但相较原始模型存在一定性能降级。