Understanding the mechanisms behind Vision Transformer (ViT), particularly its vulnerability to adversarial perturba tions, is crucial for addressing challenges in its real-world applications. Existing ViT adversarial attackers rely on la bels to calculate the gradient for perturbation, and exhibit low transferability to other structures and tasks. In this paper, we present a label-free white-box attack approach for ViT-based models that exhibits strong transferability to various black box models, including most ViT variants, CNNs, and MLPs, even for models developed for other modalities. Our inspira tion comes from the feature collapse phenomenon in ViTs, where the critical attention mechanism overly depends on the low-frequency component of features, causing the features in middle-to-end layers to become increasingly similar and eventually collapse. We propose the feature diversity attacker to naturally accelerate this process and achieve remarkable performance and transferability.
翻译:理解Vision Transformer(ViT)背后的机制,特别是其对对抗性扰动的脆弱性,对于解决其在实际应用中的挑战至关重要。现有的ViT对抗性攻击方法依赖于标签来计算扰动的梯度,且对其他结构和任务的迁移性较低。本文提出了一种针对基于ViT模型的无标签白盒攻击方法,该方法对各种黑盒模型(包括大多数ViT变体、CNN和MLP)具有强迁移性,甚至对为其他模态开发的模型也是如此。我们的灵感来源于ViT中的特征坍塌现象,其中关键的注意力机制过度依赖于特征的低频成分,导致中间层到后层的特征逐渐相似并最终坍塌。我们提出了特征多样性攻击器来自然地加速这一过程,并实现了显著的性能和迁移性。