The prompt-based learning paradigm, which bridges the gap between pre-training and fine-tuning, achieves state-of-the-art performance on several NLP tasks, particularly in few-shot settings. Despite being widely applied, prompt-based learning is vulnerable to backdoor attacks. Textual backdoor attacks are designed to introduce targeted vulnerabilities into models by poisoning a subset of training samples through trigger injection and label modification. However, they suffer from flaws such as abnormal natural language expressions resulting from the trigger and incorrect labeling of poisoned samples. In this study, we propose {\bf ProAttack}, a novel and efficient method for performing clean-label backdoor attacks based on the prompt, which uses the prompt itself as a trigger. Our method does not require external triggers and ensures correct labeling of poisoned samples, improving the stealthy nature of the backdoor attack. With extensive experiments on rich-resource and few-shot text classification tasks, we empirically validate ProAttack's competitive performance in textual backdoor attacks. Notably, in the rich-resource setting, ProAttack achieves state-of-the-art attack success rates in the clean-label backdoor attack benchmark without external triggers. All data and code used in our models are publically available\footnote{\url{https://github.com/shuaizhao95/Prompt_attack}}.
翻译:基于提示的学习范式弥合了预训练与微调之间的差距,在多种自然语言处理任务中(尤其是在小样本场景下)取得了最先进的性能。尽管被广泛应用,基于提示的学习易受后门攻击。文本后门攻击旨在通过触发器注入和标签修改的方式污染部分训练样本,从而向模型中引入目标性脆弱性。然而,此类攻击存在缺陷,例如由触发器导致的异常自然语言表达以及被污染样本的标签错误。在本研究中,我们提出了**ProAttack**,一种基于提示的干净标签后门攻击的新颖高效方法,该方法直接使用提示本身作为触发器。我们的方法无需外部触发器,并确保被污染样本的标签正确性,从而提升了后门攻击的隐蔽性。通过在丰富资源和小样本文本分类任务上的广泛实验,我们实证验证了ProAttack在文本后门攻击中的竞争性能。值得注意的是,在丰富资源场景下,ProAttack在无需外部触发器的干净标签后门攻击基准测试中达到了最先进的攻击成功率。我们模型中使用的所有数据和代码均已公开提供\footnote{\url{https://github.com/shuaizhao95/Prompt_attack}}。