Deep neural networks are vulnerable to adversarial examples, posing a threat to the models' applications and raising security concerns. An intriguing property of adversarial examples is their strong transferability. Several methods have been proposed to enhance transferability, including ensemble attacks which have demonstrated their efficacy. However, prior approaches simply average logits, probabilities, or losses for model ensembling, lacking a comprehensive analysis of how and why model ensembling significantly improves transferability. In this paper, we propose a similar targeted attack method named Similar Target~(ST). By promoting cosine similarity between the gradients of each model, our method regularizes the optimization direction to simultaneously attack all surrogate models. This strategy has been proven to enhance generalization ability. Experimental results on ImageNet validate the effectiveness of our approach in improving adversarial transferability. Our method outperforms state-of-the-art attackers on 18 discriminative classifiers and adversarially trained models.
翻译:深度神经网络易受对抗样本攻击,这对模型的应用构成威胁并引发安全担忧。对抗样本的一个显著特性是它们具有较强的可迁移性。目前已提出多种方法以增强可迁移性,其中集成攻击已被证明有效。然而,先前的方法仅对模型集成的logits、概率或损失进行简单平均,缺乏对模型集成如何及为何能显著提升可迁移性的全面分析。本文提出了一种名为相似目标(Similar Target, ST)的相似性目标攻击方法。通过提升各模型梯度间的余弦相似度,我们的方法对优化方向进行正则化,使其能同时攻击所有替代模型。该策略已被证明能够增强泛化能力。在ImageNet上的实验结果验证了我们的方法在提升对抗可迁移性方面的有效性。与现有最先进的攻击方法相比,我们的方法在18个判别分类器和对抗训练模型上均表现更优。