Deep equilibrium models (DEQs), as a typical implicit neural network, have demonstrated remarkable success on various tasks. There is, however, a lack of theoretical understanding of the connections and differences between implicit DEQs and explicit neural network models. In this paper, leveraging recent advances in random matrix theory (RMT), we perform an in-depth analysis on the eigenspectra of the conjugate kernel (CK) and neural tangent kernel (NTK) matrices for implicit DEQs, when the input data are drawn from a high-dimensional Gaussian mixture. We prove, in this setting, that the spectral behavior of these Implicit-CKs and NTKs depend on the DEQ activation function and initial weight variances, but only via a system of four nonlinear equations. As a direct consequence of this theoretical result, we demonstrate that a shallow explicit network can be carefully designed to produce the same CK or NTK as a given DEQ. Despite derived here for Gaussian mixture data, empirical results show the proposed theory and design principle also apply to popular real-world datasets.
翻译:深层平衡模型(DEQs)作为一种典型的隐式神经网络,已在各类任务中展现出显著成效。然而,关于隐式DEQs与显式神经网络模型之间联系与差异的理论理解仍存在不足。本文借助随机矩阵理论(RMT)的最新进展,对输入数据来自高维高斯混合分布时的隐式DEQs共轭核(CK)和神经正切核(NTK)矩阵的特征谱进行了深入分析。我们证明,在此设定下,这些隐式CK和NTK的谱行为取决于DEQ激活函数和初始权重方差,但仅通过一个含四非线性方程的系统起作用。该理论结果的一个直接推论是:可精心设计浅层显式网络以生成与给定DEQ相同的CK或NTK。尽管本文针对高斯混合数据推导,但实验结果表明,所提出的理论和设计原理也适用于流行的真实世界数据集。