Recently it has been observed that neural networks exhibit Neural Collapse (NC) during the final stage of training for the classification problem. We empirically show that multivariate regression, as employed in imitation learning and other applications, exhibits Neural Regression Collapse (NRC), a new form of neural collapse: (NRC1) The last-layer feature vectors collapse to the subspace spanned by the $n$ principal components of the feature vectors, where $n$ is the dimension of the targets (for univariate regression, $n=1$); (NRC2) The last-layer feature vectors also collapse to the subspace spanned by the last-layer weight vectors; (NRC3) The Gram matrix for the weight vectors converges to a specific functional form that depends on the covariance matrix of the targets. After empirically establishing the prevalence of (NRC1)-(NRC3) for a variety of datasets and network architectures, we provide an explanation of these phenomena by modeling the regression task in the context of the Unconstrained Feature Model (UFM), in which the last layer feature vectors are treated as free variables when minimizing the loss function. We show that when the regularization parameters in the UFM model are strictly positive, then (NRC1)-(NRC3) also emerge as solutions in the UFM optimization problem. We also show that if the regularization parameters are equal to zero, then there is no collapse. To our knowledge, this is the first empirical and theoretical study of neural collapse in the context of regression. This extension is significant not only because it broadens the applicability of neural collapse to a new category of problems but also because it suggests that the phenomena of neural collapse could be a universal behavior in deep learning.
翻译:近期研究表明,在分类问题训练的最后阶段,神经网络会表现出神经坍缩(NC)现象。我们通过实证研究发现,在模仿学习及其他应用中广泛使用的多元回归方法会表现出一种新的神经坍缩形式——神经回归坍缩(NRC):(NRC1)最后一层特征向量坍缩至特征向量前$n$个主成分所张成的子空间,其中$n$为目标维度(对于单变量回归,$n=1$);(NRC2)最后一层特征向量同时坍缩至最后一层权重向量所张成的子空间;(NRC3)权重向量的格拉姆矩阵收敛于特定函数形式,该形式取决于目标变量的协方差矩阵。在通过多种数据集和网络架构实证验证(NRC1)-(NRC3)的普遍性后,我们基于无约束特征模型(UFM)对回归任务进行建模,通过将最后一层特征向量视为损失函数最小化过程中的自由变量,对这些现象给出理论解释。我们证明当UFM模型中的正则化参数严格为正时,(NRC1)-(NRC3)同样会作为UFM优化问题的解出现。同时我们发现,若正则化参数为零,则不会出现坍缩现象。据我们所知,这是首次针对回归场景中神经坍缩现象的实证与理论研究。此项扩展具有重要意义,不仅将神经坍缩的适用性拓展至新问题类别,更暗示神经坍缩现象可能是深度学习中的普适性行为。