Deep learning has made significant advances in computer vision, particularly in image classification tasks. Despite their high accuracy on training data, deep learning models often face challenges related to complexity and overfitting. One notable concern is that the model often relies heavily on a limited subset of filters for making predictions. This dependency can result in compromised generalization and an increased vulnerability to minor variations. While regularization techniques like weight decay, dropout, and data augmentation are commonly used to address this issue, they may not directly tackle the reliance on specific filters. Our observations reveal that the heavy reliance problem gets severe when slow-learning filters are deprived of learning opportunities due to fast-learning filters. Drawing inspiration from image augmentation research that combats over-reliance on specific image regions by removing and replacing parts of images, our idea is to mitigate the problem of over-reliance on strong filters by substituting highly activated features. To this end, we present a novel method called Catch-up Mix, which provides learning opportunities to a wide range of filters during training, focusing on filters that may lag behind. By mixing activation maps with relatively lower norms, Catch-up Mix promotes the development of more diverse representations and reduces reliance on a small subset of filters. Experimental results demonstrate the superiority of our method in various vision classification datasets, providing enhanced robustness.
翻译:深度学习在计算机视觉领域取得了显著进展,尤其在图像分类任务中。尽管模型在训练数据上具有高精度,但深度学习模型常面临复杂性与过拟合的挑战。一个值得关注的问题是,模型往往过度依赖少数具有判别能力的滤波器进行预测。这种依赖性可能导致泛化能力下降,并增加对微小扰动的脆弱性。虽然权重衰减、Dropout和数据增强等正则化技术常被用于解决此类问题,但它们可能无法直接针对特定滤波器的依赖现象。我们的观察表明,当学习缓慢的滤波器因学习快速的滤波器而丧失学习机会时,重度依赖问题会加剧。受图像增强研究中通过移除和替换图像局部区域来抑制对特定区域过度依赖的启发,我们提出通过替换高激活特征来缓解对强势滤波器的过度依赖问题。为此,我们提出一种名为Catch-up Mix的新方法,该方法在训练过程中为各类滤波器提供学习机会,尤其关注可能滞后的滤波器。通过混合范数相对较低的激活图,Catch-up Mix促进模型生成更多样化的特征表示,并减少对少数滤波器的依赖。实验结果表明,该方法在多个视觉分类数据集上具有优越性,能显著提升模型的鲁棒性。