A deep equilibrium model (DEQ) is implicitly defined through an equilibrium point of an infinite-depth weight-tied model with an input-injection. Instead of infinite computations, it solves an equilibrium point directly with root-finding and computes gradients with implicit differentiation. The training dynamics of over-parameterized DEQs are investigated in this study. By supposing a condition on the initial equilibrium point, we show that the unique equilibrium point always exists during the training process, and the gradient descent is proved to converge to a globally optimal solution at a linear convergence rate for the quadratic loss function. In order to show that the required initial condition is satisfied via mild over-parameterization, we perform a fine-grained analysis on random DEQs. We propose a novel probabilistic framework to overcome the technical difficulty in the non-asymptotic analysis of infinite-depth weight-tied models.
翻译:深度平衡模型通过输入注入的无限深度权重共享模型的平衡点隐式定义。不同于无限计算,它直接通过求根法求解平衡点,并利用隐式微分计算梯度。本研究探讨了过参数化深度平衡模型的训练动力学。通过假设初始平衡点满足特定条件,我们证明训练过程中唯一平衡点始终存在,且梯度下降在二次损失函数下以线性收敛速度收敛到全局最优解。为证明轻度过参数化即可满足所需初始条件,我们对随机深度平衡模型进行了细粒度分析。我们提出了一种全新的概率框架,以克服无限深度权重共享模型非渐近分析中的技术难点。