Going Further: Flatness at the Rescue of Early Stopping for Adversarial Example Transferability

Transferability is the property of adversarial examples to be misclassified by other models than the surrogate model for which they were crafted. Previous research has shown that transferability is substantially increased when the training of the surrogate model has been early stopped. A common hypothesis to explain this is that the later training epochs are when models learn the non-robust features that adversarial attacks exploit. Hence, an early stopped model is more robust (hence, a better surrogate) than fully trained models. We demonstrate that the reasons why early stopping improves transferability lie in the side effects it has on the learning dynamics of the model. We first show that early stopping benefits transferability even on models learning from data with non-robust features. We then establish links between transferability and the exploration of the loss landscape in the parameter space, on which early stopping has an inherent effect. More precisely, we observe that transferability peaks when the learning rate decays, which is also the time at which the sharpness of the loss significantly drops. This leads us to propose RFN, a new approach for transferability that minimizes loss sharpness during training in order to maximize transferability. We show that by searching for large flat neighborhoods, RFN always improves over early stopping (by up to 47 points of transferability rate) and is competitive to (if not better than) strong state-of-the-art baselines.

翻译：迁移性是指对抗样本能够被除生成该样本的替代模型之外的其他模型错误分类的特性。先前研究表明，当替代模型的训练采用早停策略时，迁移性会显著提升。对此的常见解释是：后期训练轮次是模型学习对抗攻击所利用的非鲁棒特征的阶段。因此，早停模型相比完全训练模型具有更强的鲁棒性（从而成为更好的替代模型）。我们证明早停提升迁移性的原因在于其对模型学习动态产生的副作用。首先表明，即便在从包含非鲁棒特征的数据中学习的模型中，早停也能提升迁移性。随后，我们建立了迁移性与参数空间中损失景观探索之间的联系——早停对此具有固有影响。具体而言，我们观察到当学习率衰减时迁移性达到峰值，此时损失的尖锐度显著下降。这促使我们提出RFN这一新的迁移性增强方法，该方法在训练过程中最小化损失尖锐度以最大化迁移性。实验证明，通过搜索大平坦邻域，RFN始终优于早停策略（迁移率最高提升47个百分点），且与强基线方法相比具有竞争力（甚至更优）。

相关内容

对抗样本

关注 13

对抗样本由Christian Szegedy等人提出，是指在数据集中通过故意添加细微的干扰所形成的输入样本，导致模型以高置信度给出一个错误的输出。在正则化背景下，通过对抗训练减少原有独立同分布的测试集的错误率——在对抗扰动的训练集样本上训练网络。对抗样本是指通过在数据中故意添加细微的扰动生成的一种输入样本，能够导致神经网络模型给出一个错误的预测结果。实质：对抗样本是通过向输入中加入人类难以察觉的扰动生成，能够改变人工智能模型的行为。其基本目标有两个，一是改变模型的预测结果；二是加入到输入中的扰动在人类看起来不足以引起模型预测结果的改变，具有表面上的无害性。对抗样本的相关研究对自动驾驶、智能家居等应用场景具有非常重要的意义。

【NeurIPS 2022】EvenNet:忽略Odd-Hop邻居改善图神经网络的鲁棒性

专知会员服务

19+阅读 · 2022年11月15日

图神经网络黑盒攻击近期进展

专知会员服务

19+阅读 · 2022年10月14日

【KDD22】DICE: 域攻击不变的因果学习以保护数据隐私、提升攻击迁移性和对抗鲁棒性

专知会员服务

12+阅读 · 2022年8月27日

Google 发布82页《深度学习泛化性揭秘》综述论文，On the Generalization Mystery in Deep Learning

专知会员服务

61+阅读 · 2022年3月22日