This paper addresses the tradeoff between standard accuracy on clean examples and robustness against adversarial examples in deep neural networks (DNNs). Although adversarial training (AT) improves robustness, it degrades the standard accuracy, thus yielding the tradeoff. To mitigate this tradeoff, we propose a novel AT method called ARREST, which comprises three components: (i) adversarial finetuning (AFT), (ii) representation-guided knowledge distillation (RGKD), and (iii) noisy replay (NR). AFT trains a DNN on adversarial examples by initializing its parameters with a DNN that is standardly pretrained on clean examples. RGKD and NR respectively entail a regularization term and an algorithm to preserve latent representations of clean examples during AFT. RGKD penalizes the distance between the representations of the standardly pretrained and AFT DNNs. NR switches input adversarial examples to nonadversarial ones when the representation changes significantly during AFT. By combining these components, ARREST achieves both high standard accuracy and robustness. Experimental results demonstrate that ARREST mitigates the tradeoff more effectively than previous AT-based methods do.
翻译:本文探讨了深度神经网络(DNN)在干净样本上的标准准确率与对抗样本鲁棒性之间的权衡问题。尽管对抗训练(AT)提升了鲁棒性,但降低了标准准确率,从而产生了这一权衡。为缓解此权衡,我们提出一种名为ARREST的新型对抗训练方法,它包含三个组成部分:(i)对抗微调(AFT)、(ii)表示引导的知识蒸馏(RGKD)和(iii)噪声回放(NR)。AFT通过使用在干净样本上标准预训练的DNN参数初始化,在对抗样本上训练DNN。RGKD和NR分别引入一个正则化项和一个算法,用于在AFT过程中保留干净样本的潜在表示。RGKD惩罚标准预训练DNN与AFT DNN表示之间的距离。NR在AFT期间表示发生显著变化时,将输入的对抗样本切换为非对抗样本。通过结合这些组件,ARREST同时实现了高标准准确率和鲁棒性。实验结果表明,ARREST比以往基于AT的方法更有效地缓解了准确率-鲁棒性权衡。