Empirical studies have revealed low dimensional structures in the eigenspectra of weights, Hessians, gradients, and feature vectors of deep networks, consistently observed across datasets and architectures in the overparameterized regime. In this work, we analyze deep unconstrained feature models (UFMs) to provide an analytic explanation of how these structures emerge at the layerwise level, including the bulk outlier Hessian spectrum and the alignment of gradient descent with the outlier eigenspace. We show that deep neural collapse underlies these phenomena, deriving explicit expressions for eigenvalues and eigenvectors of many deep learning matrices in terms of class feature means. Furthermore, we demonstrate that the full Hessian inherits its low dimensional structure from the layerwise Hessians, and empirically validate our theory in both UFMs and deep networks.
翻译:实证研究揭示了深度网络权重、海森矩阵、梯度及特征向量特征谱中的低维结构,这些结构在过参数化机制下跨数据集和架构持续被观测到。本文通过分析深度无约束特征模型,为这些结构在逐层水平上的涌现提供了理论解释,包括海森矩阵谱的主体-异常值分布以及梯度下降与异常值特征空间的对齐机制。我们证明深度神经坍缩是这些现象的内在基础,并推导出多个深度学习矩阵特征值与特征向量关于类别特征均值的显式表达式。此外,我们论证了整体海森矩阵的低维结构继承自逐层海森矩阵,并通过无约束特征模型和深度网络进行了实证验证。