Adversarial Training (AT), which adversarially perturb the input samples during training, has been acknowledged as one of the most effective defenses against adversarial attacks, yet suffers from a fundamental tradeoff that inevitably decreases clean accuracy. Instead of perturbing the samples, Sharpness-Aware Minimization (SAM) perturbs the model weights during training to find a more flat loss landscape and improve generalization. However, as SAM is designed for better clean accuracy, its effectiveness in enhancing adversarial robustness remains unexplored. In this work, considering the duality between SAM and AT, we investigate the adversarial robustness derived from SAM. Intriguingly, we find that using SAM alone can improve adversarial robustness. To understand this unexpected property of SAM, we first provide empirical and theoretical insights into how SAM can implicitly learn more robust features, and conduct comprehensive experiments to show that SAM can improve adversarial robustness notably without sacrificing any clean accuracy, shedding light on the potential of SAM to be a substitute for AT when accuracy comes at a higher priority. Code is available at https://github.com/weizeming/SAM_AT.
翻译:对抗训练(AT)通过在训练过程中对抗性地扰动输入样本,已被公认为对抗攻击最有效的防御手段之一,但存在一个根本性的权衡问题,即不可避免地会降低干净样本准确率。与扰动样本不同,锐度敏感最小化(SAM)在训练过程中扰动模型权重,以寻找更平坦的损失景观并提升泛化能力。然而,由于SAM旨在提升干净样本准确率,其增强对抗鲁棒性的有效性尚未被探索。在本研究中,考虑到SAM与AT之间的对偶性,我们探究了SAM所带来的对抗鲁棒性。有趣的是,我们发现仅使用SAM就能提升对抗鲁棒性。为了理解SAM这一出人意料的特性,我们首先从经验和理论角度揭示了SAM如何隐式学习更鲁棒的特征,并通过全面实验证明SAM能在不牺牲任何干净样本准确率的情况下显著提升对抗鲁棒性,这揭示了当准确率优先级更高时SAM有潜力替代AT。代码开源地址:https://github.com/weizeming/SAM_AT。