Deep neural networks can be easily fooled into making incorrect predictions through corruption of the input by adversarial perturbations: human-imperceptible artificial noise. So far adversarial training has been the most successful defense against such adversarial attacks. This work focuses on improving adversarial training to boost adversarial robustness. We first analyze, from an instance-wise perspective, how adversarial vulnerability evolves during adversarial training. We find that during training an overall reduction of adversarial loss is achieved by sacrificing a considerable proportion of training samples to be more vulnerable to adversarial attack, which results in an uneven distribution of adversarial vulnerability among data. Such "uneven vulnerability", is prevalent across several popular robust training methods and, more importantly, relates to overfitting in adversarial training. Motivated by this observation, we propose a new adversarial training method: Instance-adaptive Smoothness Enhanced Adversarial Training (ISEAT). It jointly smooths both input and weight loss landscapes in an adaptive, instance-specific, way to enhance robustness more for those samples with higher adversarial vulnerability. Extensive experiments demonstrate the superiority of our method over existing defense methods. Noticeably, our method, when combined with the latest data augmentation and semi-supervised learning techniques, achieves state-of-the-art robustness against $\ell_{\infty}$-norm constrained attacks on CIFAR10 of 59.32% for Wide ResNet34-10 without extra data, and 61.55% for Wide ResNet28-10 with extra data. Code is available at https://github.com/TreeLLi/Instance-adaptive-Smoothness-Enhanced-AT.
翻译:深度神经网络容易因对抗性扰动(人眼不可察觉的人工噪声)对输入的破坏而被轻易误导做出错误预测。目前,对抗训练是对抗此类攻击最成功的防御方法。本文致力于改进对抗训练以提升对抗鲁棒性。我们首先从实例角度分析对抗脆弱性在对抗训练过程中的演变规律,发现训练过程中整体对抗损失的降低是以牺牲相当比例的训练样本为代价,使其更易受到对抗攻击,从而导致数据间对抗脆弱性的分布不均衡。这种"不均等脆弱性"现象普遍存在于多种主流鲁棒训练方法中,更重要的是,它与对抗训练中的过拟合密切相关。基于这一发现,我们提出新型对抗训练方法:实例自适应平滑增强对抗训练(ISEAT)。该方法以自适应、实例特定的方式同时平滑输入和权重损失景观,为对抗脆弱性更高的样本增强更多鲁棒性。大量实验表明,我们的方法优于现有防御方法。值得注意的是,当结合最新的数据增强与半监督学习技术时,本方法在CIFAR10数据集上实现了对$\ell_{\infty}$范数约束攻击的最先进鲁棒性:Wide ResNet34-10在无额外数据下达到59.32%,Wide ResNet28-10在额外数据下达到61.55%。代码开源地址:https://github.com/TreeLLi/Instance-adaptive-Smoothness-Enhanced-AT。