Deep equilibrium models (DEQs), as a typical implicit neural network, have demonstrated remarkable success on various tasks. There is, however, a lack of theoretical understanding of the connections and differences between implicit DEQs and explicit neural network models. In this paper, leveraging recent advances in random matrix theory (RMT), we perform an in-depth analysis on the eigenspectra of the conjugate kernel (CK) and neural tangent kernel (NTK) matrices for implicit DEQs, when the input data are drawn from a high-dimensional Gaussian mixture. We prove, in this setting, that the spectral behavior of these Implicit-CKs and NTKs depend on the DEQ activation function and initial weight variances, but only via a system of four nonlinear equations. As a direct consequence of this theoretical result, we demonstrate that a shallow explicit network can be carefully designed to produce the same CK or NTK as a given DEQ. Despite derived here for Gaussian mixture data, empirical results show the proposed theory and design principle also apply to popular real-world datasets.
翻译:深度平衡模型(DEQs)作为一种典型的隐式神经网络,已在多种任务中展现出卓越性能。然而,关于隐式DEQs与显式神经网络模型之间联系与差异的理论理解仍存在空白。本文借助随机矩阵理论(RMT)的最新进展,针对输入数据来自高维高斯混合分布的情况,对隐式DEQs的共轭核(CK)和神经正切核(NTK)矩阵的特征谱进行了深入分析。我们证明,在此设定下,这些隐式CK和NTK的谱特性取决于DEQ激活函数及初始权重方差,但仅通过一个由四个非线性方程构成的系统决定。这一理论结果的直接推论表明:通过精心设计,浅层显式网络可产生与给定DEQ完全相同的CK或NTK。尽管本文针对高斯混合数据推导,但实验结果表明所提出的理论与设计原则同样适用于流行的真实世界数据集。