The prompt-based learning paradigm, which bridges the gap between pre-training and fine-tuning, achieves state-of-the-art performance on several NLP tasks, particularly in few-shot settings. Despite being widely applied, prompt-based learning is vulnerable to backdoor attacks. Textual backdoor attacks are designed to introduce targeted vulnerabilities into models by poisoning a subset of training samples through trigger injection and label modification. However, they suffer from flaws such as abnormal natural language expressions resulting from the trigger and incorrect labeling of poisoned samples. In this study, we propose ProAttack, a novel and efficient method for performing clean-label backdoor attacks based on the prompt, which uses the prompt itself as a trigger. Our method does not require external triggers and ensures correct labeling of poisoned samples, improving the stealthy nature of the backdoor attack. With extensive experiments on rich-resource and few-shot text classification tasks, we empirically validate ProAttack's competitive performance in textual backdoor attacks. Notably, in the rich-resource setting, ProAttack achieves state-of-the-art attack success rates in the clean-label backdoor attack benchmark without external triggers.
翻译:基于提示的学习范式弥合了预训练与微调之间的差距,在多项自然语言处理任务中(特别是在少样本场景下)取得了最先进的性能。尽管被广泛应用,基于提示的学习容易受到后门攻击。文本后门攻击通过注入触发器并修改标签来污染部分训练样本,从而为目标模型引入特定漏洞。然而,这类攻击存在缺陷,例如触发器导致的异常自然语言表达以及被污染样本的错误标注。在本研究中,我们提出ProAttack——一种新颖且高效的基于提示的干净标签后门攻击方法,该方法直接使用提示本身作为触发器。我们的方法无需外部触发器,并确保被污染样本的标签正确,从而提升了后门攻击的隐蔽性。通过在资源丰富和少样本文本分类任务上的大量实验,我们实证验证了ProAttack在文本后门攻击中的竞争性表现。值得注意的是,在资源丰富的场景下,ProAttack在无需外部触发器的干净标签后门攻击基准测试中达到了最先进的攻击成功率。