Adversarial examples mislead deep neural networks with imperceptible perturbations and have brought significant threats to deep learning. An important aspect is their transferability, which refers to their ability to deceive other models, thus enabling attacks in the black-box setting. Though various methods have been proposed to boost transferability, the performance still falls short compared with white-box attacks. In this work, we observe that existing input transformation based attacks, one of the mainstream transfer-based attacks, result in different attention heatmaps on various models, which might limit the transferability. We also find that breaking the intrinsic relation of the image can disrupt the attention heatmap of the original image. Based on this finding, we propose a novel input transformation based attack called block shuffle and rotation (BSR). Specifically, BSR splits the input image into several blocks, then randomly shuffles and rotates these blocks to construct a set of new images for gradient calculation. Empirical evaluations on the ImageNet dataset demonstrate that BSR could achieve significantly better transferability than the existing input transformation based methods under single-model and ensemble-model settings. Combining BSR with the current input transformation method can further improve the transferability, which significantly outperforms the state-of-the-art methods.
翻译:对抗样本通过难以察觉的扰动误导深度神经网络,对深度学习构成重大威胁。其重要特性是可迁移性,即欺骗其他模型的能力,从而支持黑盒攻击场景。尽管已有多种方法被提出以增强可迁移性,但其性能仍逊于白盒攻击。本研究观察到,现有基于输入变换的攻击(主流可迁移攻击方法之一)在不同模型上会产生差异化的注意力热图,这可能限制可迁移性。我们还发现,破坏图像的固有关联性能够扰乱原始图像的注意力热图。基于此发现,我们提出一种新型输入变换攻击方法——块打乱与旋转(BSR)。具体而言,BSR将输入图像分割为若干块,随机打乱并旋转这些块以构建一组新图像用于梯度计算。在ImageNet数据集上的实验表明,在单模型与集成模型设置下,BSR的可迁移性均显著优于现有基于输入变换的方法。将BSR与当前输入变换方法结合可进一步提升可迁移性,其性能大幅超越现有最优方法。