TARGET: Template-Transferable Backdoor Attack Against Prompt-based NLP Models via GPT4

Prompt-based learning has been widely applied in many low-resource NLP tasks such as few-shot scenarios. However, this paradigm has been shown to be vulnerable to backdoor attacks. Most of the existing attack methods focus on inserting manually predefined templates as triggers in the pre-training phase to train the victim model and utilize the same triggers in the downstream task to perform inference, which tends to ignore the transferability and stealthiness of the templates. In this work, we propose a novel approach of TARGET (Template-trAnsfeRable backdoor attack aGainst prompt-basEd NLP models via GPT4), which is a data-independent attack method. Specifically, we first utilize GPT4 to reformulate manual templates to generate tone-strong and normal templates, and the former are injected into the model as a backdoor trigger in the pre-training phase. Then, we not only directly employ the above templates in the downstream task, but also use GPT4 to generate templates with similar tone to the above templates to carry out transferable attacks. Finally we have conducted extensive experiments on five NLP datasets and three BERT series models, with experimental results justifying that our TARGET method has better attack performance and stealthiness compared to the two-external baseline methods on direct attacks, and in addition achieves satisfactory attack capability in the unseen tone-similar templates.

翻译：摘要：基于提示的学习已广泛应用于少样本等低资源自然语言处理任务中。然而，这一范式已被证明易受后门攻击。现有攻击方法大多聚焦于在预训练阶段插入人工预定义模板作为触发器以训练受害模型，并在下游任务中使用相同触发器进行推理，这往往忽略了模板的可迁移性与隐蔽性。在本研究中，我们提出了一种新颖的TARGET方法（基于GPT4针对提示型NLP模型的可迁移模板后门攻击），这是一种数据无关的攻击方法。具体而言，我们首先利用GPT4对人工模板进行改写，生成语气强烈型和普通型模板，并在预训练阶段将前者作为后门触发器注入模型。随后，我们不仅直接在下游任务中使用上述模板，还通过GPT4生成与上述模板语气相似的模板以实施可迁移攻击。最后，我们在五个NLP数据集和三个BERT系列模型上开展了广泛实验，结果表明，与两种外部基线方法相比，我们的TARGET方法在直接攻击中具有更优的攻击性能与隐蔽性，并在未见过的语气相似模板上取得了令人满意的攻击能力。

相关内容

GPT-4

关注 29

北京时间2023年3月15日凌晨，ChatGPT开发商OpenAI 发布了发布了全新的多模态预训练大模型 GPT-4，可以更可靠、更具创造力、能处理更细节的指令，根据图片和文字提示都能生成相应内容。具体来说来说，GPT-4 相比上一代的模型，实现了飞跃式提升：支持图像和文本输入，拥有强大的识图能力；大幅提升了文字输入限制，在ChatGPT模式下，GPT-4可以处理超过2.5万字的文本，可以处理一些更加细节的指令；回答准确性也得到了显著提高。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日