In recent years, there has been an explosion of research into developing more robust deep neural networks against adversarial examples. Adversarial training appears as one of the most successful methods. To deal with both the robustness against adversarial examples and the accuracy over clean examples, many works develop enhanced adversarial training methods to achieve various trade-offs between them. Leveraging over the studies that smoothed update on weights during training may help find flat minima and improve generalization, we suggest reconciling the robustness-accuracy trade-off from another perspective, i.e., by adding random noise into deterministic weights. The randomized weights enable our design of a novel adversarial training method via Taylor expansion of a small Gaussian noise, and we show that the new adversarial training method can flatten loss landscape and find flat minima. With PGD, CW, and Auto Attacks, an extensive set of experiments demonstrate that our method enhances the state-of-the-art adversarial training methods, boosting both robustness and clean accuracy. The code is available at https://github.com/Alexkael/Randomized-Adversarial-Training.
翻译:近年来,针对深度神经网络对抗样本的鲁棒性研究呈爆炸式增长,其中对抗训练已成为最成功的方法之一。为同时兼顾对抗样本鲁棒性与干净样本准确性,大量研究开发了增强型对抗训练方法以寻求两者间的平衡。基于训练过程中对权重的平滑更新有助于寻找平坦最小值并提升泛化能力的研究发现,我们提出从另一视角调和鲁棒性与准确性的权衡:将随机噪声注入确定性权重。随机化权重使我们能够通过小尺度高斯噪声的泰勒展开设计新型对抗训练方法,实验证明该新方法可使损失 landscape 平坦化并找到平坦最小值。在PGD、CW及Auto Attacks攻击下的大规模实验表明,我们的方法能显著提升现有先进对抗训练方法的表现,同时增强鲁棒性与干净准确率。相关代码已开源至https://github.com/Alexkael/Randomized-Adversarial-Training。