NOTABLE: Transferable Backdoor Attacks Against Prompt-based NLP Models

Prompt-based learning is vulnerable to backdoor attacks. Existing backdoor attacks against prompt-based models consider injecting backdoors into the entire embedding layers or word embedding vectors. Such attacks can be easily affected by retraining on downstream tasks and with different prompting strategies, limiting the transferability of backdoor attacks. In this work, we propose transferable backdoor attacks against prompt-based models, called NOTABLE, which is independent of downstream tasks and prompting strategies. Specifically, NOTABLE injects backdoors into the encoders of PLMs by utilizing an adaptive verbalizer to bind triggers to specific words (i.e., anchors). It activates the backdoor by pasting input with triggers to reach adversary-desired anchors, achieving independence from downstream tasks and prompting strategies. We conduct experiments on six NLP tasks, three popular models, and three prompting strategies. Empirical results show that NOTABLE achieves superior attack performance (i.e., attack success rate over 90% on all the datasets), and outperforms two state-of-the-art baselines. Evaluations on three defenses show the robustness of NOTABLE. Our code can be found at https://github.com/RU-System-Software-and-Security/Notable.

翻译：提示学习容易受到后门攻击。现有的针对基于提示模型的后门攻击考虑将后门注入整个嵌入层或词嵌入向量中。此类攻击容易受到下游任务重训练及不同提示策略的影响，从而限制了后门攻击的可迁移性。本文提出了一种针对基于提示模型的可迁移后门攻击方法，称为NOTABLE，该方法独立于下游任务和提示策略。具体而言，NOTABLE通过利用自适应动词化器将触发器与特定词（即锚点）绑定，将后门注入预训练语言模型的编码器中。通过将输入与触发器拼接以触发后门，达到攻击者期望的锚点，从而实现对下游任务和提示策略的独立性。我们在六项自然语言处理任务、三种流行模型和三种提示策略上进行了实验。实验结果表明，NOTABLE实现了优越的攻击性能（即所有数据集上的攻击成功率超过90%），并优于两种最先进的基线方法。对三种防御措施的评估显示了NOTABLE的鲁棒性。我们的代码可在https://github.com/RU-System-Software-and-Security/Notable获取。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【ACL2020-CMU】预训练模型权重攻击，Weight Poisoning Attacks on PTM

专知会员服务

12+阅读 · 2020年4月16日