Deep equilibrium models (DEQs) achieve infinitely deep network representations without stacking layers by exploring fixed points of layer transformations in neural networks. Such models constitute an innovative approach that achieves performance comparable to state-of-the-art methods in many large-scale numerical experiments, despite requiring significantly less memory. However, DEQs face the challenge of requiring vastly more computational time for training and inference than conventional methods, as they repeatedly perform fixed-point iterations with no convergence guarantee upon each input. Therefore, this study explored an approach to improve fixed-point convergence and consequently reduce computational time by restructuring the model architecture to guarantee fixed-point convergence. Our proposed approach for image classification, Lipschitz multiscale DEQ, has theoretically guaranteed fixed-point convergence for both forward and backward passes by hyperparameter adjustment, achieving up to a 4.75$\times$ speed-up in numerical experiments on CIFAR-10 at the cost of a minor drop in accuracy.
翻译:深度均衡模型(DEQs)通过探索神经网络中层变换的不动点,实现了无需堆叠层级的无限深度网络表示。此类模型作为一种创新方法,在诸多大规模数值实验中取得了与最先进方法相当的性能,同时显著降低了内存需求。然而,DEQs面临的主要挑战在于训练和推理所需的计算时间远超传统方法,因为其对每个输入需反复执行不动点迭代且无法保证收敛性。因此,本研究探索了一种通过重构模型架构以保证不动点收敛性,从而提升收敛速度并减少计算时间的方案。我们提出的图像分类方法——Lipschitz多尺度DEQ,通过超参数调整在理论上保证了前向传播与反向传播的不动点收敛性,在CIFAR-10数据集上的数值实验实现了最高4.75倍的加速,仅伴随微小的精度损失。