The current paradigm of training deep neural networks for classification tasks includes minimizing the empirical risk that pushes the training loss value towards zero, even after the training error has been vanished. In this terminal phase of training, it has been observed that the last-layer features collapse to their class-means and these class-means converge to the vertices of a simplex Equiangular Tight Frame (ETF). This phenomenon is termed as Neural Collapse (NC). To theoretically understand this phenomenon, recent works employ a simplified unconstrained feature model to prove that NC emerges at the global solutions of the training problem. However, when the training dataset is class-imbalanced, some NC properties will no longer be true. For example, the class-means geometry will skew away from the simplex ETF when the loss converges. In this paper, we generalize NC to imbalanced regime for cross-entropy loss under the unconstrained ReLU feature model. We prove that, while the within-class features collapse property still holds in this setting, the class-means will converge to a structure consisting of orthogonal vectors with different lengths. Furthermore, we find that the classifier weights are aligned to the scaled and centered class-means with scaling factors depend on the number of training samples of each class, which generalizes NC in the class-balanced setting. We empirically prove our results through experiments on practical architectures and dataset.
翻译:当前训练分类任务的深度神经网络的范式包括:在训练误差已消失后,仍将经验风险最小化以驱使训练损失值趋近于零。在此训练终期阶段,观测到最后一层特征会坍缩至其类别均值,且这些类别均值收敛至一个等角紧框架(ETF)单纯形的顶点,该现象被称为神经坍缩(NC)。为从理论上解释该现象,近期研究采用简化无约束特征模型,证明NC出现在训练问题的全局解中。然而,当训练数据集存在类别不平衡时,部分NC性质将不再成立。例如,当损失收敛时,类别均值的几何结构会偏离单纯形ETF。本文在无约束ReLU特征模型下,将NC推广至交叉熵损失的不平衡场景。我们证明:在此设定下,虽然类内特征坍缩性质仍然成立,但类别均值将收敛至由不同长度的正交向量构成的结构。此外,我们发现分类器权重与经缩放和中心化处理的类别均值对齐,其缩放因子取决于每类训练样本数量,这推广了类别平衡场景下的NC性质。我们通过在真实架构和数据集上的实验实证验证了上述结论。