Neural collapse provides an elegant mathematical characterization of learned last layer representations (a.k.a. features) and classifier weights in deep classification models. Such results not only provide insights but also motivate new techniques for improving practical deep models. However, most of the existing empirical and theoretical studies in neural collapse focus on the case that the number of classes is small relative to the dimension of the feature space. This paper extends neural collapse to cases where the number of classes are much larger than the dimension of feature space, which broadly occur for language models, retrieval systems, and face recognition applications. We show that the features and classifier exhibit a generalized neural collapse phenomenon, where the minimum one-vs-rest margins is maximized.We provide empirical study to verify the occurrence of generalized neural collapse in practical deep neural networks. Moreover, we provide theoretical study to show that the generalized neural collapse provably occurs under unconstrained feature model with spherical constraint, under certain technical conditions on feature dimension and number of classes.
翻译:神经坍缩为深度分类模型中学习到的最后一层表示(即特征)和分类器权重提供了优雅的数学刻画。此类结果不仅提供了洞见,还激发了改进实际深度模型的新技术。然而,现有神经坍缩的实证与理论研究大多集中于类别数量相对于特征空间维度较小的情况。本文将神经坍缩扩展到类别数量远大于特征空间维度的情形,这广泛出现在语言模型、检索系统和面部识别应用中。我们证明特征与分类器展现出一种广义神经坍缩现象,其中最小一类对余类间隔达到最大化。我们通过实证研究验证了实际深度神经网络中广义神经坍缩的发生。此外,我们提供的理论研究表明,在无约束特征模型与球形约束条件下,当特征维度与类别数量满足特定技术条件时,广义神经坍缩必然发生。