Vision Transformers (ViTs) have demonstrated the state-of-the-art performance in various vision-related tasks. The success of ViTs motivates adversaries to perform backdoor attacks on ViTs. Although the vulnerability of traditional CNNs to backdoor attacks is well-known, backdoor attacks on ViTs are seldom-studied. Compared to CNNs capturing pixel-wise local features by convolutions, ViTs extract global context information through patches and attentions. Na\"ively transplanting CNN-specific backdoor attacks to ViTs yields only a low clean data accuracy and a low attack success rate. In this paper, we propose a stealth and practical ViT-specific backdoor attack $TrojViT$. Rather than an area-wise trigger used by CNN-specific backdoor attacks, TrojViT generates a patch-wise trigger designed to build a Trojan composed of some vulnerable bits on the parameters of a ViT stored in DRAM memory through patch salience ranking and attention-target loss. TrojViT further uses minimum-tuned parameter update to reduce the bit number of the Trojan. Once the attacker inserts the Trojan into the ViT model by flipping the vulnerable bits, the ViT model still produces normal inference accuracy with benign inputs. But when the attacker embeds a trigger into an input, the ViT model is forced to classify the input to a predefined target class. We show that flipping only few vulnerable bits identified by TrojViT on a ViT model using the well-known RowHammer can transform the model into a backdoored one. We perform extensive experiments of multiple datasets on various ViT models. TrojViT can classify $99.64\%$ of test images to a target class by flipping $345$ bits on a ViT for ImageNet.Our codes are available at https://github.com/mxzheng/TrojViT
翻译:视觉Transformer(ViTs)已在多种视觉任务中展现出最先进的性能。ViT的成功促使攻击者对ViT发起后门攻击。尽管传统CNN易受后门攻击的弱点已广为人知,但针对ViT的后门攻击鲜有研究。与通过卷积捕获像素级局部特征的CNN不同,ViT通过图像块和注意力机制提取全局上下文信息。简单地将CNN特定的后门攻击移植到ViT上,会导致干净数据准确率和攻击成功率均较低。本文提出一种隐蔽且实用的ViT特定后门攻击方法TrojViT。与CNN特定后门攻击使用的区域级触发器不同,TrojViT通过图像块显著度排名和注意力目标损失,生成一种图像块级触发器,旨在利用DRAM中存储的ViT参数构建由若干脆弱比特位组成的木马。TrojViT进一步采用最小化调参更新策略,以减少木马的比特数。一旦攻击者通过翻转脆弱比特位将木马插入ViT模型,该模型对良性输入仍能保持正常推理准确率。但当攻击者向输入中嵌入触发器时,ViT模型会被强制将该输入分类至预设的目标类别。我们证明,仅需翻转TrojViT识别出的少数脆弱比特位(利用著名的RowHammer技术),即可将ViT模型转化为带后门的模型。我们在多种ViT模型上针对多个数据集进行了大量实验。TrojViT通过翻转ImageNet上ViT模型的345个比特位,可将99.64%的测试图像分类至目标类别。我们的代码已开源至https://github.com/mxzheng/TrojViT。