Deep Neural Collapse (DNC) refers to the surprisingly rigid structure of the data representations in the final layers of Deep Neural Networks (DNNs). Though the phenomenon has been measured in a variety of settings, its emergence is typically explained via data-agnostic approaches, such as the unconstrained features model. In this work, we introduce a data-dependent setting where DNC forms due to feature learning through the average gradient outer product (AGOP). The AGOP is defined with respect to a learned predictor and is equal to the uncentered covariance matrix of its input-output gradients averaged over the training dataset. Deep Recursive Feature Machines are a method that constructs a neural network by iteratively mapping the data with the AGOP and applying an untrained random feature map. We demonstrate theoretically and empirically that DNC occurs in Deep Recursive Feature Machines as a consequence of the projection with the AGOP matrix computed at each layer. We then provide evidence that this mechanism holds for neural networks more generally. We show that the right singular vectors and values of the weights can be responsible for the majority of within-class variability collapse for DNNs trained in the feature learning regime. As observed in recent work, this singular structure is highly correlated with that of the AGOP.
翻译:深度神经坍缩(DNC)指的是深度神经网络(DNNs)最终层数据表示中出现的惊人刚性结构。尽管该现象已在多种场景中被观测到,但其成因通常通过数据无关的方法(如无约束特征模型)来解释。在本工作中,我们引入了一个数据依赖的场景,其中DNC的形成源于通过平均梯度外积(AGOP)进行的特征学习。AGOP是相对于已学习的预测器定义的,等于其输入-输出梯度在训练数据集上平均的未中心化协方差矩阵。深度递归特征机是一种通过迭代地使用AGOP映射数据并应用未经训练随机特征映射来构建神经网络的方法。我们从理论和实验上证明,DNC出现在深度递归特征机中,是每层计算的AGOP矩阵投影的结果。随后我们提供证据表明该机制在更广泛的神经网络中同样成立。我们证明,在特征学习机制下训练的DNNs中,权重的右奇异向量和奇异值可对类内变异性坍缩的主要部分负责。正如近期研究所示,该奇异结构与AGOP的奇异结构高度相关。