A Simple Deep Equilibrium Model Converges to Global Optima with Weight Tying

A deep equilibrium linear model is implicitly defined through an equilibrium point of an infinite sequence of computation. It avoids any explicit computation of the infinite sequence by finding an equilibrium point directly via root-finding and by computing gradients via implicit differentiation. It is a simple deep equilibrium model with nonlinear activations on weight matrices. In this paper, we analyze the gradient dynamics of this simple deep equilibrium model with non-convex objective functions for a general class of losses used in regression and classification. Despite non-convexity, convergence to global optimum at a linear rate is guaranteed without any assumption on the width of the models, allowing the width to be smaller than the output dimension and the number of data points. Moreover, we prove a relation between the gradient dynamics of the simple deep equilibrium model and the dynamics of trust region Newton method of a shallow model. This mathematically proven relation along with our numerical observation suggests the importance of understanding implicit bias and a possible open problem on the topic. Our proofs deal with nonlinearity and weight tying, and differ from those in the related literature.

翻译：深平衡线模型通过无限计算序列的平衡点暗含了线性模型的定义。它避免通过直接通过根调查找到一个平衡点和通过隐含的差别计算梯度来明确计算无限序列。它是一个简单的深平衡模型,在重量矩阵上使用非线性激活。在本文中, 我们分析了这一简单的深平衡模型的梯度动态, 并分析了在回归和分类中使用的关于一般损失类别的非线性客观函数。尽管不均匀, 却保证以线性速度与全球最佳一致, 不假定模型的宽度, 允许宽度小于输出维度和数据点数。此外, 我们还证明了简单深度平衡模型的梯度动态与浅度模型的牛顿信任区域方法的动态之间的关系。这一数学证明的关系与我们的数字观察表明理解隐性偏差的重要性, 以及这个专题上可能存在的一个开放问题。我们的证据涉及非线性和重量搭配问题, 与相关文献中的证据不同。