The generalization gap on the long-tailed data sets is largely owing to most categories only occupying a few training samples. Decoupled training achieves better performance by training backbone and classifier separately. What causes the poorer performance of end-to-end model training (e.g., logits margin-based methods)? In this work, we identify a key factor that affects the learning of the classifier: the channel-correlated features with low entropy before inputting into the classifier. From the perspective of information theory, we analyze why cross-entropy loss tends to produce highly correlated features on the imbalanced data. In addition, we theoretically analyze and prove its impacts on the gradients of classifier weights, the condition number of Hessian, and logits margin-based approach. Therefore, we firstly propose to use Channel Whitening to decorrelate ("scatter") the classifier's inputs for decoupling the weight update and reshaping the skewed decision boundary, which achieves satisfactory results combined with logits margin-based method. However, when the number of minor classes are large, batch imbalance and more participation in training cause over-fitting of the major classes. We also propose two novel modules, Block-based Relatively Balanced Batch Sampler (B3RS) and Batch Embedded Training (BET) to solve the above problems, which makes the end-to-end training achieve even better performance than decoupled training. Experimental results on the long-tailed classification benchmarks, CIFAR-LT and ImageNet-LT, demonstrate the effectiveness of our method.
翻译:长尾数据集上的泛化差距主要源于大多数类别仅占据少量训练样本。分阶段训练通过分别训练骨干网络和分类器取得了更优性能。那么,端到端模型训练(如基于logits边距的方法)性能较差的原因何在?本研究识别出一个影响分类器学习的关键因素:在输入分类器之前,具有低熵的通道相关特征。从信息论视角出发,我们分析了为何交叉熵损失在不平衡数据上倾向于产生高度相关特征。此外,我们理论分析并证明了该现象对分类器权重梯度、海森矩阵条件数以及基于logits边距方法的影响。为此,我们首次提出使用通道白化方法解相关(“分散”)分类器输入,以解耦权重更新并重塑倾斜决策边界,该方法与基于logits边距的方法结合取得了满意效果。然而,当尾部类别规模较大时,批次不平衡与更多参与训练会导致头部类别过拟合。我们另提出两个创新模块——基于块的相对均衡批采样器(B3RS)与批嵌入训练(BET)来解决上述问题,使端到端训练性能甚至优于分阶段训练。在长尾分类基准数据集CIFAR-LT和ImageNet-LT上的实验结果验证了我们方法的有效性。