Transferability is the property of adversarial examples to be misclassified by other models than the surrogate model for which they were crafted. Previous research has shown that transferability is substantially increased when the training of the surrogate model has been early stopped. A common hypothesis to explain this is that the later training epochs are when models learn the non-robust features that adversarial attacks exploit. Hence, an early stopped model is more robust (hence, a better surrogate) than fully trained models. We demonstrate that the reasons why early stopping improves transferability lie in the side effects it has on the learning dynamics of the model. We first show that early stopping benefits transferability even on models learning from data with non-robust features. We then establish links between transferability and the exploration of the loss landscape in the parameter space, on which early stopping has an inherent effect. More precisely, we observe that transferability peaks when the learning rate decays, which is also the time at which the sharpness of the loss significantly drops. This leads us to propose RFN, a new approach for transferability that minimizes loss sharpness during training in order to maximize transferability. We show that by searching for large flat neighborhoods, RFN always improves over early stopping (by up to 47 points of transferability rate) and is competitive to (if not better than) strong state-of-the-art baselines.
翻译:迁移性是指对抗样本能够被除生成该样本的替代模型之外的其他模型错误分类的特性。先前研究表明,当替代模型的训练采用早停策略时,迁移性会显著提升。对此的常见解释是:后期训练轮次是模型学习对抗攻击所利用的非鲁棒特征的阶段。因此,早停模型相比完全训练模型具有更强的鲁棒性(从而成为更好的替代模型)。我们证明早停提升迁移性的原因在于其对模型学习动态产生的副作用。首先表明,即便在从包含非鲁棒特征的数据中学习的模型中,早停也能提升迁移性。随后,我们建立了迁移性与参数空间中损失景观探索之间的联系——早停对此具有固有影响。具体而言,我们观察到当学习率衰减时迁移性达到峰值,此时损失的尖锐度显著下降。这促使我们提出RFN这一新的迁移性增强方法,该方法在训练过程中最小化损失尖锐度以最大化迁移性。实验证明,通过搜索大平坦邻域,RFN始终优于早停策略(迁移率最高提升47个百分点),且与强基线方法相比具有竞争力(甚至更优)。