Given the power of vision transformers, a new learning paradigm, pre-training and then prompting, makes it more efficient and effective to address downstream visual recognition tasks. In this paper, we identify a novel security threat towards such a paradigm from the perspective of backdoor attacks. Specifically, an extra prompt token, called the switch token in this work, can turn the backdoor mode on, i.e., converting a benign model into a backdoored one. Once under the backdoor mode, a specific trigger can force the model to predict a target class. It poses a severe risk to the users of cloud API, since the malicious behavior can not be activated and detected under the benign mode, thus making the attack very stealthy. To attack a pre-trained model, our proposed attack, named SWARM, learns a trigger and prompt tokens including a switch token. They are optimized with the clean loss which encourages the model always behaves normally even the trigger presents, and the backdoor loss that ensures the backdoor can be activated by the trigger when the switch is on. Besides, we utilize the cross-mode feature distillation to reduce the effect of the switch token on clean samples. The experiments on diverse visual recognition tasks confirm the success of our switchable backdoor attack, i.e., achieving 95%+ attack success rate, and also being hard to be detected and removed. Our code is available at https://github.com/20000yshust/SWARM.
翻译:鉴于视觉Transformer的强大能力,一种新的学习范式——先预训练后提示——使其更高效地解决下游视觉识别任务。本文从后门攻击视角识别出该范式面临的新型安全威胁。具体而言,额外添加的提示令牌(本文称为开关令牌)可开启后门模式,即良性模型转化为带后门模型。在后门模式下,特定触发器能强制模型预测目标类别。这对云API用户构成严重威胁,因为恶意行为在良性模式下无法被激活和检测,使得攻击极具隐蔽性。我们提出的SWARM攻击方法通过学习触发器和包含开关令牌的提示令牌来攻击预训练模型。这些令牌通过两类损失函数优化:干净损失确保即使存在触发器时模型仍表现正常,后门损失保证当开关开启时触发器可激活后门。此外,我们利用跨模态特征蒸馏降低开关令牌对干净样本的影响。在多种视觉识别任务上的实验证实了可切换后门攻击的成功性——攻击成功率超过95%,且难以被检测和移除。我们的代码已开源至https://github.com/20000yshust/SWARM。