Adversarial training has been widely used to enhance the robustness of neural network models against adversarial attacks. Despite the popularity of neural network models, a significant gap exists between the natural and robust accuracy of these models. In this paper, we identify one of the primary reasons for this gap is the common use of one-hot vectors as labels, which hinders the learning process for image recognition. Representing ambiguous images with one-hot vectors is imprecise and may lead the model to suboptimal solutions. To overcome this issue, we propose a novel method called Low Temperature Distillation (LTD) that generates soft labels using the modified knowledge distillation framework. Unlike previous approaches, LTD uses a relatively low temperature in the teacher model and fixed, but different temperatures for the teacher and student models. This modification boosts the model's robustness without encountering the gradient masking problem that has been addressed in defensive distillation. The experimental results demonstrate the effectiveness of the proposed LTD method combined with previous techniques, achieving robust accuracy rates of 58.19%, 31.13%, and 42.08% on CIFAR-10, CIFAR-100, and ImageNet data sets, respectively, without additional unlabeled data.
翻译:对抗训练已被广泛用于增强神经网络模型对抗攻击的鲁棒性。尽管神经网络模型广受欢迎,但其自然准确率与鲁棒准确率之间仍存在显著差距。本文发现,这一差距的主要原因之一是普遍使用独热向量作为标签,这阻碍了图像识别的学习过程。用独热向量表示模糊图像不够精确,可能导致模型陷入次优解。为解决该问题,我们提出了一种名为低温蒸馏(LTD)的新方法,通过改进的知识蒸馏框架生成软标签。与先前方法不同,LTD在教师模型中采用相对较低的温度,并对教师模型和学生模型使用固定但不同的温度。这种改进提升了模型的鲁棒性,同时避免了防御蒸馏中曾出现的梯度掩蔽问题。实验结果表明,将所提出的LTD方法与现有技术结合,在CIFAR-10、CIFAR-100和ImageNet数据集上分别实现了58.19%、31.13%和42.08%的鲁棒准确率,且无需额外无标签数据。