When training overparameterized deep networks for classification tasks, it has been widely observed that the learned features exhibit a so-called "neural collapse" phenomenon. More specifically, for the output features of the penultimate layer, for each class the within-class features converge to their means, and the means of different classes exhibit a certain tight frame structure, which is also aligned with the last layer's classifier. As feature normalization in the last layer becomes a common practice in modern representation learning, in this work we theoretically justify the neural collapse phenomenon for normalized features. Based on an unconstrained feature model, we simplify the empirical loss function in a multi-class classification task into a nonconvex optimization problem over the Riemannian manifold by constraining all features and classifiers over the sphere. In this context, we analyze the nonconvex landscape of the Riemannian optimization problem over the product of spheres, showing a benign global landscape in the sense that the only global minimizers are the neural collapse solutions while all other critical points are strict saddles with negative curvature. Experimental results on practical deep networks corroborate our theory and demonstrate that better representations can be learned faster via feature normalization.
翻译:在面向分类任务的过参数化深度网络训练中,广泛观察到学习特征呈现出所谓的"神经崩溃"现象。具体而言,对于倒数第二层的输出特征,每个类别的类内特征收敛至其均值,且不同类别的均值呈现出特定的紧框架结构,该结构与最后一层分类器也保持对齐。由于最后一层的特征归一化已成为现代表征学习中的常见实践,本研究从理论上论证了归一化特征下的神经崩溃现象。基于无约束特征模型,我们通过将所有特征与分类器约束在超球面上,将多类分类任务中的经验损失函数简化为黎曼流形上的非凸优化问题。在此框架下,我们分析了球面乘积空间上的黎曼优化问题非凸景观,表明其全局最优解仅包含神经崩溃解,而所有其他临界点均为具有负曲率的严格鞍点。实际深度网络上的实验结果验证了我们的理论,并表明通过特征归一化可以更快地学习到更优的表征。