Adversarial Robustness Distillation (ARD) is a novel method to boost the robustness of small models. Unlike general adversarial training, its robust knowledge transfer can be less easily restricted by the model capacity. However, the teacher model that provides the robustness of knowledge does not always make correct predictions, interfering with the student's robust performances. Besides, in the previous ARD methods, the robustness comes entirely from one-to-one imitation, ignoring the relationship between examples. To this end, we propose a novel structured ARD method called Contrastive Relationship DeNoise Distillation (CRDND). We design an adaptive compensation module to model the instability of the teacher. Moreover, we utilize the contrastive relationship to explore implicit robustness knowledge among multiple examples. Experimental results on multiple attack benchmarks show CRDND can transfer robust knowledge efficiently and achieves state-of-the-art performances.
翻译:对抗鲁棒蒸馏(ARD)是一种提升小模型鲁棒性的新颖方法。与通用对抗训练不同,其鲁棒知识迁移受模型容量的限制较小。然而,提供鲁棒知识的教师模型并非总能做出正确预测,这会影响学生模型的鲁棒性能。此外,现有ARD方法中,鲁棒性完全来自一对一的模仿,忽略了样本间的关系。为此,我们提出了一种名为对比关系去噪蒸馏(CRDND)的新型结构化ARD方法。该方法设计了自适应补偿模块来建模教师模型的不稳定性;同时,利用对比关系探索多个样本间的隐式鲁棒知识。在多个攻击基准上的实验结果表明,CRDND能够高效迁移鲁棒知识,并达到最先进的性能水平。