Adversarial examples have attracted widespread attention in security-critical applications because of their transferability across different models. Although many methods have been proposed to boost adversarial transferability, a gap still exists between capabilities and practical demand. In this paper, we argue that the model-specific discriminative regions are a key factor causing overfitting to the source model, and thus reducing the transferability to the target model. For that, a patch-wise mask is utilized to prune the model-specific regions when calculating adversarial perturbations. To accurately localize these regions, we present a learnable approach to automatically optimize the mask. Specifically, we simulate the target models in our framework, and adjust the patch-wise mask according to the feedback of the simulated models. To improve the efficiency, the differential evolutionary (DE) algorithm is utilized to search for patch-wise masks for a specific image. During iterative attacks, the learned masks are applied to the image to drop out the patches related to model-specific regions, thus making the gradients more generic and improving the adversarial transferability. The proposed approach is a preprocessing method and can be integrated with existing methods to further boost the transferability. Extensive experiments on the ImageNet dataset demonstrate the effectiveness of our method. We incorporate the proposed approach with existing methods to perform ensemble attacks and achieve an average success rate of 93.01% against seven advanced defense methods, which can effectively enhance the state-of-the-art transfer-based attack performance.
翻译:对抗样本因其在不同模型间的迁移性而在安全关键应用领域受到广泛关注。尽管已有许多方法被提出以提升对抗迁移性,但现有能力与实际需求之间仍存在差距。本文指出,模型特有的判别性区域是导致对抗扰动对源模型过拟合、进而降低向目标模型迁移性的关键因素。为此,我们采用分块掩码在计算对抗扰动时修剪模型特有区域。为精确定位这些区域,我们提出一种可学习方法来自动优化掩码。具体而言,我们在框架中模拟目标模型,并根据模拟模型的反馈调整分块掩码。为提升效率,采用差分进化算法为特定图像搜索分块掩码。在迭代攻击过程中,将学习到的掩码应用于图像,丢弃与模型特有区域相关的图像块,从而使梯度更具通用性并提升对抗迁移性。该方法作为预处理技术,可与现有方法集成以进一步增强迁移性。在ImageNet数据集上的大量实验证明了该方法的有效性。我们将所提方法与现有方法结合进行集成攻击,在七种先进防御方法上取得了93.01%的平均攻击成功率,有效提升了当前最先进的基于迁移的攻击性能。