Adversarial training has been considered an imperative component for safely deploying neural network-based applications to the real world. To achieve stronger robustness, existing methods primarily focus on how to generate strong attacks by increasing the number of update steps, regularizing the models with the smoothed loss function, and injecting the randomness into the attack. Instead, we analyze the behavior of adversarial training through the lens of response frequency. We empirically discover that adversarial training causes neural networks to have low convergence to high-frequency information, resulting in highly oscillated predictions near each data. To learn high-frequency contents efficiently and effectively, we first prove that a universal phenomenon of frequency principle, i.e., \textit{lower frequencies are learned first}, still holds in adversarial training. Based on that, we propose phase-shifted adversarial training (PhaseAT) in which the model learns high-frequency components by shifting these frequencies to the low-frequency range where the fast convergence occurs. For evaluations, we conduct the experiments on CIFAR-10 and ImageNet with the adaptive attack carefully designed for reliable evaluation. Comprehensive results show that PhaseAT significantly improves the convergence for high-frequency information. This results in improved adversarial robustness by enabling the model to have smoothed predictions near each data.
翻译:对抗训练已被视为安全部署基于神经网络的应用程序到现实世界中的必要组成部分。为获得更强的鲁棒性,现有方法主要关注如何通过增加更新步数、利用平滑损失函数正则化模型以及向攻击中注入随机性来生成强攻击。相反,我们从响应频率的角度分析对抗训练的行为。我们通过实验发现,对抗训练导致神经网络对高频信息的收敛速度降低,从而使得每个数据点附近的预测高度振荡。为高效且有效地学习高频内容,我们首先证明了频率原理的一个普遍现象,即低频信息先被学习,在对抗训练中仍然成立。基于此,我们提出了相位偏移对抗训练(PhaseAT),其中模型通过将这些频率偏移到快速收敛的低频范围来学习高频成分。为进行评估,我们在CIFAR-10和ImageNet上进行了实验,并采用了为可靠评估而精心设计的自适应攻击。综合结果表明,PhaseAT显著改善了高频信息的收敛速度。这使得模型在每个数据点附近具有平滑的预测,从而提高了对抗鲁棒性。