The notion of neural collapse refers to several emergent phenomena that have been empirically observed across various canonical classification problems. During the terminal phase of training a deep neural network, the feature embedding of all examples of the same class tend to collapse to a single representation, and the features of different classes tend to separate as much as possible. Neural collapse is often studied through a simplified model, called the unconstrained feature representation, in which the model is assumed to have "infinite expressivity" and can map each data point to any arbitrary representation. In this work, we propose a more realistic variant of the unconstrained feature representation that takes the limited expressivity of the network into account. Empirical evidence suggests that the memorization of noisy data points leads to a degradation (dilation) of the neural collapse. Using a model of the memorization-dilation (M-D) phenomenon, we show one mechanism by which different losses lead to different performances of the trained network on noisy data. Our proofs reveal why label smoothing, a modification of cross-entropy empirically observed to produce a regularization effect, leads to improved generalization in classification tasks.
翻译:神经坍缩是指在多种典型分类问题中经验观测到的一系列新兴现象。在深度神经网络训练的终止阶段,同类样本的特征嵌入趋向于坍缩为单一表征,而不同类别特征则尽可能分离。通常通过无约束特征表示这一简化模型研究神经坍缩,该模型假设网络具有“无限表达能力”,可将每个数据点映射至任意表征。本文提出一种更符合实际的无约束特征表示变体,将网络有限的表达能力纳入考量。实验证据表明,含噪数据点的记忆会导致神经坍缩退化(膨胀)。通过记忆-膨胀现象的建模,我们揭示了不同损失函数导致训练网络在含噪数据上性能差异的机制。理论证明验证了标签平滑——一种经验观测具有正则化效果的交叉熵改进——为何能提升分类任务的泛化能力。