Neural collapse, a newly identified characteristic, describes a property of solutions during model training. In this paper, we explore neural collapse in the context of imbalanced data. We consider the $L$-extended unconstrained feature model with a bias term and provide a theoretical analysis of global minimizer. Our findings include: (1) Features within the same class converge to their class mean, similar to both the balanced case and the imbalanced case without bias. (2) The geometric structure is mainly on the left orthonormal transformation of the product of $L$ linear classifiers and the right transformation of the class-mean matrix. (3) Some rows of the left orthonormal transformation of the product of $L$ linear classifiers collapse to zeros and others are orthogonal, which relies on the singular values of $\hat Y=(I_K-1/N\mathbf{n}1^\top_K)D$, where $K$ is class size, $\mathbf{n}$ is the vector of sample size for each class, $D$ is the diagonal matrix whose diagonal entries are given by $\sqrt{\mathbf{n}}$. Similar results are for the columns of the right orthonormal transformation of the product of class-mean matrix and $D$. (4) The $i$-th row of the left orthonormal transformation of the product of $L$ linear classifiers aligns with the $i$-th column of the right orthonormal transformation of the product of class-mean matrix and $D$. (5) We provide the estimation of singular values about $\hat Y$. Our numerical experiments support these theoretical findings.
翻译:神经坍缩是一种新近发现的特征,描述了模型训练过程中解的性质。本文探究了不平衡数据背景下的神经坍缩现象。我们考虑带偏置项的$L$扩展无约束特征模型,并对全局最小化解进行了理论分析。我们的发现包括:(1) 同一类别内的特征收敛至其类均值,这与平衡数据情形以及无偏置的不平衡数据情形相似。(2) 几何结构主要体现于$L$个线性分类器乘积的左正交变换与类均值矩阵的右变换上。(3) $L$个线性分类器乘积的左正交变换的某些行坍缩为零向量,其余行则相互正交,这取决于矩阵$\hat Y=(I_K-1/N\mathbf{n}1^\top_K)D$的奇异值,其中$K$为类别数,$\mathbf{n}$为各类样本数量的向量,$D$是以$\sqrt{\mathbf{n}}$为对角线元素的对角矩阵。类均值矩阵与$D$乘积的右正交变换的列具有类似性质。(4) $L$个线性分类器乘积的左正交变换的第$i$行与类均值矩阵和$D$乘积的右正交变换的第$i$列对齐。(5) 我们给出了关于$\hat Y$奇异值的估计。数值实验支持了这些理论发现。