Recently, a vast amount of literature has focused on the "Neural Collapse" (NC) phenomenon, which emerges when training neural network (NN) classifiers beyond the zero training error point. The core component of NC is the decrease in the within class variability of the network's deepest features, dubbed as NC1. The theoretical works that study NC are typically based on simplified unconstrained features models (UFMs) that mask any effect of the data on the extent of collapse. In this paper, we provide a kernel-based analysis that does not suffer from this limitation. First, given a kernel function, we establish expressions for the traces of the within- and between-class covariance matrices of the samples' features (and consequently an NC1 metric). Then, we turn to focus on kernels associated with shallow NNs. First, we consider the NN Gaussian Process kernel (NNGP), associated with the network at initialization, and the complement Neural Tangent Kernel (NTK), associated with its training in the "lazy regime". Interestingly, we show that the NTK does not represent more collapsed features than the NNGP for prototypical data models. As NC emerges from training, we then consider an alternative to NTK: the recently proposed adaptive kernel, which generalizes NNGP to model the feature mapping learned from the training data. Contrasting our NC1 analysis for these two kernels enables gaining insights into the effect of data distribution on the extent of collapse, which are empirically aligned with the behavior observed with practical training of NNs.
翻译:近来,大量文献聚焦于“神经坍缩”(NC)现象,该现象在神经网络(NN)分类器的训练超越零训练误差点时出现。NC的核心组成部分是网络最深特征在类内变异性的减少,称为NC1。研究NC的理论工作通常基于简化的无约束特征模型(UFMs),这类模型掩盖了数据对坍缩程度的任何影响。本文提出了一种基于核的分析方法,避免了这一局限。首先,给定一个核函数,我们建立了样本特征的类内与类间协方差矩阵迹的表达式(进而推导出NC1度量指标)。随后,我们将重点转向与浅层神经网络相关的核。首先,我们考虑与网络初始化相关的神经网络高斯过程核(NNGP),以及与“惰性训练机制”相关的互补神经正切核(NTK)。有趣的是,我们证明对于典型数据模型,NTK并未表现出比NNGP更坍缩的特征。由于NC产生于训练过程,我们进而考虑NTK的一种替代方案:近期提出的自适应核,该核将NNGP推广至建模从训练数据中学习到的特征映射。对比这两种核的NC1分析,有助于深入理解数据分布对坍缩程度的影响,其实证结果与神经网络实际训练中观察到的行为相符。