Adversarial training is a practical approach for improving the robustness of deep neural networks against adversarial attacks. Although bringing reliable robustness, the performance toward clean examples is negatively affected after adversarial training, which means a trade-off exists between accuracy and robustness. Recently, some studies have tried to use knowledge distillation methods in adversarial training, achieving competitive performance in improving the robustness but the accuracy for clean samples is still limited. In this paper, to mitigate the accuracy-robustness trade-off, we introduce the Multi-Teacher Adversarial Robustness Distillation (MTARD) to guide the model's adversarial training process by applying a strong clean teacher and a strong robust teacher to handle the clean examples and adversarial examples, respectively. During the optimization process, to ensure that different teachers show similar knowledge scales, we design the Entropy-Based Balance algorithm to adjust the teacher's temperature and keep the teachers' information entropy consistent. Besides, to ensure that the student has a relatively consistent learning speed from multiple teachers, we propose the Normalization Loss Balance algorithm to adjust the learning weights of different types of knowledge. A series of experiments conducted on public datasets demonstrate that MTARD outperforms the state-of-the-art adversarial training and distillation methods against various adversarial attacks.
翻译:对抗训练是提升深度神经网络对抗攻击鲁棒性的有效方法。尽管带来了可靠的鲁棒性,但对抗训练后的模型在处理干净样本时的性能会受到影响,这意味着准确性与鲁棒性之间存在权衡。近年来,一些研究尝试在对抗训练中引入知识蒸馏方法,在提升鲁棒性方面取得了竞争性表现,但模型对干净样本的准确性仍然有限。为缓解准确性与鲁棒性之间的权衡,本文提出多教师对抗鲁棒蒸馏(MTARD),通过引入一个强干净教师和一个强鲁棒教师,分别指导模型处理干净样本和对抗样本的对抗训练过程。在优化过程中,为确保不同教师具有相似的知识尺度,我们设计了基于熵的平衡算法来调整教师的温度参数,使教师的信息熵保持一致。此外,为确保学生模型从多个教师处保持相对一致的学习速度,我们提出归一化损失平衡算法来调整不同类型知识的学习权重。在公开数据集上进行的一系列实验表明,MTARD在应对各类对抗攻击时均优于当前最先进的对抗训练与蒸馏方法。