Fast adversarial training (FAT) is beneficial for improving the adversarial robustness of neural networks. However, previous FAT work has encountered a significant issue known as catastrophic overfitting when dealing with large perturbation budgets, \ie the adversarial robustness of models declines to near zero during training. To address this, we analyze the training process of prior FAT work and observe that catastrophic overfitting is accompanied by the appearance of loss convergence outliers. Therefore, we argue a moderately smooth loss convergence process will be a stable FAT process that solves catastrophic overfitting. To obtain a smooth loss convergence process, we propose a novel oscillatory constraint (dubbed ConvergeSmooth) to limit the loss difference between adjacent epochs. The convergence stride of ConvergeSmooth is introduced to balance convergence and smoothing. Likewise, we design weight centralization without introducing additional hyperparameters other than the loss balance coefficient. Our proposed methods are attack-agnostic and thus can improve the training stability of various FAT techniques. Extensive experiments on popular datasets show that the proposed methods efficiently avoid catastrophic overfitting and outperform all previous FAT methods. Code is available at \url{https://github.com/FAT-CS/ConvergeSmooth}.
翻译:快速对抗训练(FAT)有助于提升神经网络的对抗鲁棒性。然而,先前FAT工作在处理大扰动预算时遇到了称为灾难性过拟合的严重问题,即训练过程中模型的对抗鲁棒性下降至接近零。为解决此问题,我们分析了先前FAT工作的训练过程,观察到灾难性过拟合伴随着损失收敛异常值的出现。因此,我们认为适度平滑的损失收敛过程将是一个稳定的FAT过程,能够解决灾难性过拟合。为获得平滑的损失收敛过程,我们提出了一种新颖的振荡约束(称为ConvergeSmooth),用于限制相邻训练周期之间的损失差异。ConvergeSmooth的收敛步长被引入以平衡收敛速度与平滑效果。同样,我们设计了权重中心化方法,除损失平衡系数外无需引入额外超参数。我们提出的方法与攻击无关,因此能提升多种FAT技术的训练稳定性。在流行数据集上的大量实验表明,所提方法能有效避免灾难性过拟合,并优于所有先前FAT方法。代码见 \url{https://github.com/FAT-CS/ConvergeSmooth}。