Deep Neural Collapse (DNC) refers to the surprisingly rigid structure of the data representations in the final layers of Deep Neural Networks (DNNs). Though the phenomenon has been measured in a variety of settings, its emergence is typically explained via data-agnostic approaches, such as the unconstrained features model. In this work, we introduce a data-dependent setting where DNC forms due to feature learning through the average gradient outer product (AGOP). The AGOP is defined with respect to a learned predictor and is equal to the uncentered covariance matrix of its input-output gradients averaged over the training dataset. The Deep Recursive Feature Machine (Deep RFM) is a method that constructs a neural network by iteratively mapping the data with the AGOP and applying an untrained random feature map. We demonstrate empirically that DNC occurs in Deep RFM across standard settings as a consequence of the projection with the AGOP matrix computed at each layer. Further, we theoretically explain DNC in Deep RFM in an asymptotic setting and as a result of kernel learning. We then provide evidence that this mechanism holds for neural networks more generally. In particular, we show that the right singular vectors and values of the weights can be responsible for the majority of within-class variability collapse for DNNs trained in the feature learning regime. As observed in recent work, this singular structure is highly correlated with that of the AGOP.
翻译:深度神经坍缩(DNC)指的是深度神经网络(DNNs)最终层数据表示中出现的惊人刚性结构。尽管该现象已在多种场景中被观测到,但其成因通常通过数据无关的方法(如无约束特征模型)进行解释。本研究引入了一个数据依赖的场景,其中DNC的形成源于通过平均梯度外积(AGOP)进行的特征学习。AGOP是相对于已学习的预测器定义的,等于其输入-输出梯度在训练数据集上平均的未中心化协方差矩阵。深度递归特征机(Deep RFM)是一种通过迭代地使用AGOP映射数据并应用未经训练随机特征映射来构建神经网络的方法。我们通过实验证明,在标准设置下,由于每层计算的AGOP矩阵投影,Deep RFM中会出现DNC现象。此外,我们从理论上在渐近设定中解释了Deep RFM中的DNC,并将其归因于核学习的结果。随后,我们提供证据表明该机制在更广泛的神经网络中同样成立。具体而言,我们证明权重矩阵的右奇异向量与奇异值可对特征学习机制下训练的DNNs中类内变异性坍缩的主要部分负责。正如近期研究所示,这种奇异结构与AGOP的奇异结构高度相关。