Prompt-based learning has been widely applied in many low-resource NLP tasks such as few-shot scenarios. However, this paradigm has been shown to be vulnerable to backdoor attacks. Most of the existing attack methods focus on inserting manually predefined templates as triggers in the pre-training phase to train the victim model and utilize the same triggers in the downstream task to perform inference, which tends to ignore the transferability and stealthiness of the templates. In this work, we propose a novel approach of TARGET (Template-trAnsfeRable backdoor attack aGainst prompt-basEd NLP models via GPT4), which is a data-independent attack method. Specifically, we first utilize GPT4 to reformulate manual templates to generate tone-strong and normal templates, and the former are injected into the model as a backdoor trigger in the pre-training phase. Then, we not only directly employ the above templates in the downstream task, but also use GPT4 to generate templates with similar tone to the above templates to carry out transferable attacks. Finally we have conducted extensive experiments on five NLP datasets and three BERT series models, with experimental results justifying that our TARGET method has better attack performance and stealthiness compared to the two-external baseline methods on direct attacks, and in addition achieves satisfactory attack capability in the unseen tone-similar templates.
翻译:摘要:基于提示的学习已广泛应用于少样本等低资源自然语言处理任务中。然而,这一范式已被证明易受后门攻击。现有攻击方法大多聚焦于在预训练阶段插入人工预定义模板作为触发器以训练受害模型,并在下游任务中使用相同触发器进行推理,这往往忽略了模板的可迁移性与隐蔽性。在本研究中,我们提出了一种新颖的TARGET方法(基于GPT4针对提示型NLP模型的可迁移模板后门攻击),这是一种数据无关的攻击方法。具体而言,我们首先利用GPT4对人工模板进行改写,生成语气强烈型和普通型模板,并在预训练阶段将前者作为后门触发器注入模型。随后,我们不仅直接在下游任务中使用上述模板,还通过GPT4生成与上述模板语气相似的模板以实施可迁移攻击。最后,我们在五个NLP数据集和三个BERT系列模型上开展了广泛实验,结果表明,与两种外部基线方法相比,我们的TARGET方法在直接攻击中具有更优的攻击性能与隐蔽性,并在未见过的语气相似模板上取得了令人满意的攻击能力。