In light of the vulnerability of deep learning models to adversarial samples and the ensuing security issues, a range of methods, including Adversarial Training (AT) as a prominent representative, aimed at enhancing model robustness against various adversarial attacks, have seen rapid development. However, existing methods essentially assist the current state of target model to defend against parameter-oriented adversarial attacks with explicit or implicit computation burdens, which also suffers from unstable convergence behavior due to inconsistency of optimization trajectories. Diverging from previous work, this paper reconsiders the update rule of target model and corresponding deficiency to defend based on its current state. By introducing the historical state of the target model as a proxy, which is endowed with much prior information for defense, we formulate a two-stage update rule, resulting in a general adversarial defense framework, which we refer to as `LAST' ({\bf L}earn from the P{\bf ast}). Besides, we devise a Self Distillation (SD) based defense objective to constrain the update process of the proxy model without the introduction of larger teacher models. Experimentally, we demonstrate consistent and significant performance enhancements by refining a series of single-step and multi-step AT methods (e.g., up to $\bf 9.2\%$ and $\bf 20.5\%$ improvement of Robust Accuracy (RA) on CIFAR10 and CIFAR100 datasets, respectively) across various datasets, backbones and attack modalities, and validate its ability to enhance training stability and ameliorate catastrophic overfitting issues meanwhile.
翻译:鉴于深度学习模型对对抗样本的脆弱性及其引发的安全问题,一系列旨在增强模型对各类对抗攻击鲁棒性的方法(以对抗训练作为典型代表)已得到迅速发展。然而,现有方法本质上是基于当前状态协助目标模型防御参数导向的对抗攻击,这伴随着显式或隐式的计算负担,且因优化轨迹的不一致性导致收敛行为不稳定。与先前工作不同,本文重新审视了目标模型的更新规则及基于其当前状态进行防御的不足。通过引入目标模型的历史状态作为代理(该状态蕴含丰富的先验防御信息),我们构建了一种两阶段更新规则,从而形成通用的对抗防御框架,命名为"LAST"(从历史中学习)。此外,我们设计了一种基于自蒸馏的防御目标,通过约束代理模型的更新过程,避免了引入更大的教师模型。实验表明,通过对一系列单步和多步对抗训练方法进行改进(例如,在CIFAR10和CIFAR100数据集上,鲁棒准确率分别提升高达$\bf 9.2\%$和$\bf 20.5\%$),我们在不同数据集、骨干网络及攻击模态下均取得了持续且显著的性能增强,同时验证了该方法提升训练稳定性及缓解灾难性过拟合问题的能力。