Label Smoothing (LS) is widely adopted to reduce overconfidence in neural network predictions and improve generalization. Despite these benefits, recent studies reveal two critical issues with LS. First, LS induces overconfidence in misclassified samples. Second, it compacts feature representations into overly tight clusters, diluting intra-class diversity, although the precise cause of this phenomenon remained elusive. In this paper, we analytically decompose the LS-induced loss, exposing two key terms: (i) a regularization term that dampens overconfidence only when the prediction is correct, and (ii) an error-amplification term that arises under misclassifications. This latter term compels the network to reinforce incorrect predictions with undue certainty, exacerbating representation collapse. To address these shortcomings, we propose Max Suppression (MaxSup), which applies uniform regularization to both correct and incorrect predictions by penalizing the top-1 logit rather than the ground-truth logit. Through extensive feature-space analyses, we show that MaxSup restores intra-class variation and sharpens inter-class boundaries. Experiments on large-scale image classification and multiple downstream tasks confirm that MaxSup is a more robust alternative to LS. Code is available at: https://github.com/ZhouYuxuanYX/Maximum-Suppression-Regularization
翻译:标签平滑(LS)被广泛用于降低神经网络预测的过度自信并提升泛化能力。尽管具有这些优势,近期研究揭示了LS存在的两个关键问题。首先,LS会导致误分类样本的过度自信。其次,它会将特征表征压缩成过于紧密的簇,削弱类内多样性,尽管这一现象的确切原因此前尚不明确。本文通过解析分解LS诱导的损失函数,揭示出两个关键项:(i)仅在预测正确时抑制过度自信的正则化项;(ii)在误分类情况下产生的误差放大项。后者迫使网络以不当的确定性强化错误预测,从而加剧表征坍缩。为解决这些缺陷,我们提出最大抑制(MaxSup)方法,该方法通过对top-1逻辑值(而非真实标签逻辑值)施加惩罚,实现对正确与错误预测的均匀正则化。通过大规模特征空间分析,我们证明MaxSup能够恢复类内差异并锐化类间边界。在大规模图像分类及多项下游任务上的实验证实,MaxSup是比LS更具鲁棒性的替代方案。代码发布于:https://github.com/ZhouYuxuanYX/Maximum-Suppression-Regularization