Adversarial training has been demonstrated to be the most effective approach to defend against adversarial attacks. However, existing adversarial training methods show apparent oscillations and overfitting issue in the training process, degrading the defense efficacy. In this work, we propose a novel framework, termed Parameter Interpolation based Adversarial Training (PIAT), that makes full use of the historical information during training. Specifically, at the end of each epoch, PIAT tunes the model parameters as the interpolation of the parameters of the previous and current epochs. Besides, we suggest to use the Normalized Mean Square Error (NMSE) to further improve the robustness by aligning the clean and adversarial examples. Compared with other regularization methods, NMSE focuses more on the relative magnitude of the logits rather than the absolute magnitude. Extensive experiments on several benchmark datasets and various networks show that our method could prominently improve the model robustness and reduce the generalization error. Moreover, our framework is general and could further boost the robust accuracy when combined with other adversarial training methods.
翻译:对抗训练已被证明是防御对抗攻击最有效的方法。然而,现有对抗训练方法在训练过程中存在明显的振荡和过拟合问题,降低了防御效能。本文提出一种名为基于参数插值的对抗训练(PIAT)的新型框架,该框架充分利用训练过程中的历史信息。具体而言,在每个训练周期结束时,PIAT将模型参数调整为前一周期的当前周期参数的插值。此外,我们建议使用归一化均方误差(NMSE),通过对齐干净样本和对抗样本来进一步提升鲁棒性。与其他正则化方法相比,NMSE更关注逻辑相对大小而非绝对值。在多个基准数据集和不同网络上的大量实验表明,我们的方法能够显著提升模型鲁棒性并降低泛化误差。此外,该框架具有通用性,可与其他对抗训练方法结合进一步提升鲁棒准确率。