Recently, the robustness of deep learning models has received widespread attention, and various methods for improving model robustness have been proposed, including adversarial training, model architecture modification, design of loss functions, certified defenses, and so on. However, the principle of the robustness to attacks is still not fully understood, also the related research is still not sufficient. Here, we have identified a significant factor that affects the robustness of models: the distribution characteristics of softmax values for non-real label samples. We found that the results after an attack are highly correlated with the distribution characteristics, and thus we proposed a loss function to suppress the distribution diversity of softmax. A large number of experiments have shown that our method can improve robustness without significant time consumption.
翻译:近年来,深度学习模型的鲁棒性受到广泛关注,各类提升模型鲁棒性的方法被相继提出,包括对抗训练、模型架构修改、损失函数设计、认证防御等。然而,攻击鲁棒性的原理仍未完全明确,相关研究也尚不充分。本文揭示了影响模型鲁棒性的一个重要因素:非真实标签样本的Softmax值分布特征。我们发现,攻击后的结果与分布特征高度相关,因此提出了一种抑制Softmax分布多样性的损失函数。大量实验表明,我们的方法能够在无明显时间消耗的前提下提升模型鲁棒性。