Neural collapse (NC) refers to the surprising structure of the last layer of deep neural networks in the terminal phase of gradient descent training. Recently, an increasing amount of experimental evidence has pointed to the propagation of NC to earlier layers of neural networks. However, while the NC in the last layer is well studied theoretically, much less is known about its multi-layered counterpart - deep neural collapse (DNC). In particular, existing work focuses either on linear layers or only on the last two layers at the price of an extra assumption. Our paper fills this gap by generalizing the established analytical framework for NC - the unconstrained features model - to multiple non-linear layers. Our key technical contribution is to show that, in a deep unconstrained features model, the unique global optimum for binary classification exhibits all the properties typical of DNC. This explains the existing experimental evidence of DNC. We also empirically show that (i) by optimizing deep unconstrained features models via gradient descent, the resulting solution agrees well with our theory, and (ii) trained networks recover the unconstrained features suitable for the occurrence of DNC, thus supporting the validity of this modeling principle.
翻译:神经坍塌(NC)指深度神经网络在梯度下降训练终止阶段中最后一层呈现的惊人结构。近期大量实验证据表明NC会向神经网络前层传播。然而,尽管最后一层的NC已得到充分的理论研究,其多层对应结构——深度神经坍塌(DNC)的认知仍十分有限。现有工作或局限于线性层,或仅在额外假设下分析最后两层。本文通过将经典NC分析框架——无约束特征模型——推广至多层非线性层填补了这一空白。核心理论贡献在于证明:在深度无约束特征模型中,二分类问题的唯一全局最优解具有DNC的全部典型特征,这解释了现有DNC实验现象。我们同时通过实验表明:(i)通过梯度下降优化深度无约束特征模型时,所得解与理论高度吻合;(ii)训练后的网络能恢复适合DNC发生的无约束特征,从而验证了这一建模原则的有效性。