Adversarial examples mislead deep neural networks with imperceptible perturbations and have brought significant threats to deep learning. An important aspect is their transferability, which refers to their ability to deceive other models, thus enabling attacks in the black-box setting. Though various methods have been proposed to boost transferability, the performance still falls short compared with white-box attacks. In this work, we observe that existing input transformation based attacks, one of the mainstream transfer-based attacks, result in different attention heatmaps on various models, which might limit the transferability. We also find that breaking the intrinsic relation of the image can disrupt the attention heatmap of the original image. Based on this finding, we propose a novel input transformation based attack called block shuffle and rotation (BSR). Specifically, BSR splits the input image into several blocks, then randomly shuffles and rotates these blocks to construct a set of new images for gradient calculation. Empirical evaluations on the ImageNet dataset demonstrate that BSR could achieve significantly better transferability than the existing input transformation based methods under single-model and ensemble-model settings. Combining BSR with the current input transformation method can further improve the transferability, which significantly outperforms the state-of-the-art methods. Code is available at https://github.com/Trustworthy-AI-Group/BSR
翻译:对抗样本通过人类难以察觉的扰动欺骗深度神经网络,对深度学习构成重大威胁。其关键特性在于迁移性,即欺骗其他模型的能力,从而支持黑盒攻击场景。尽管已有多种方法被提出以增强迁移性,但其性能仍与白盒攻击存在差距。本研究发现,现有基于输入变换的主流迁移攻击方法在不同模型上会生成不同的注意力热图,这可能限制了迁移性。进一步发现,破坏图像的固有关联性可以扰乱原始图像的注意力热图。基于此,我们提出一种新型输入变换攻击——块洗牌与旋转(BSR)。具体而言,BSR将输入图像分割为若干图像块,随后随机洗牌并旋转这些图像块以构建一组新图像用于梯度计算。在ImageNet数据集上的实验表明,BSR在单模型与集成模型设置下均能显著超越现有基于输入变换的方法。将BSR与现有输入变换方法相结合可进一步优化迁移性,显著优于当前最先进方法。代码已在https://github.com/Trustworthy-AI-Group/BSR开源。