Transferable adversarial examples cause practical security risks since they can mislead a target model without knowing its internal knowledge. A conventional recipe for maximizing transferability is to keep only the optimal adversarial example from all those obtained in the optimization pipeline. In this paper, for the first time, we question this convention and demonstrate that those discarded, sub-optimal adversarial examples can be reused to boost transferability. Specifically, we propose ``Adversarial Example Soups'' (AES), with AES-tune for averaging discarded adversarial examples in hyperparameter tuning and AES-rand for stability testing. In addition, our AES is inspired by ``model soups'', which averages weights of multiple fine-tuned models for improved accuracy without increasing inference time. Extensive experiments validate the global effectiveness of our AES, boosting 10 state-of-the-art transfer attacks and their combinations by up to 13% against 10 diverse (defensive) target models. We also show the possibility of generalizing AES to other types, e.g., directly averaging multiple in-the-wild adversarial examples that yield comparable success. A promising byproduct of AES is the improved stealthiness of adversarial examples since the perturbation variances are naturally reduced.
翻译:可迁移的对抗性样本因无需知晓目标模型内部知识即可误导其判断,带来了实际安全风险。传统提升迁移性的方法通常仅保留优化过程中所有对抗样本中的最优解。本文首次质疑这一惯例,并证明那些被丢弃的次优对抗样本可被重新利用以增强迁移性。具体而言,我们提出"对抗性样本集成"(AES)方法,其中AES-tune通过平均超参数调优中丢弃的对抗样本来实现,AES-rand则用于稳定性测试。此外,我们的AES受"模型集成"启发——后者通过平均多个微调模型的权重,在不增加推理时间的前提下提升准确率。大量实验验证了AES的全局有效性:针对10种当前最先进的迁移攻击及其组合,该方法在10个(防御性)目标模型上的成功率最高提升13%。我们还展示了将AES泛化至其他类型的可能性,例如直接平均多个野外采集的对抗性样本即可获得相近的成功率。AES的一个有益副产品是自然降低了扰动方差,从而改进了对抗性样本的隐蔽性。