Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models - 专知论文

会员服务 ·

0

Performer · Prompt · state-of-the-art · MoDELS · 语言模型化 ·

2023 年 5 月 2 日

Prompt as Triggers for Backdoor Attack: Examining the Vulnerability in Language Models

翻译：提示词作为后门攻击的触发器：探究语言模型中的脆弱性

Shuai Zhao,Jinming Wen,Luu Anh Tuan,Junbo Zhao,Jie Fu

The prompt-based learning paradigm, which bridges the gap between pre-training and fine-tuning, achieves state-of-the-art performance on several NLP tasks, particularly in few-shot settings. Despite being widely applied, prompt-based learning is vulnerable to backdoor attacks. Textual backdoor attacks are designed to introduce targeted vulnerabilities into models by poisoning a subset of training samples through trigger injection and label modification. However, they suffer from flaws such as abnormal natural language expressions resulting from the trigger and incorrect labeling of poisoned samples. In this study, we propose {\bf ProAttack}, a novel and efficient method for performing clean-label backdoor attacks based on the prompt, which uses the prompt itself as a trigger. Our method does not require external triggers and ensures correct labeling of poisoned samples, improving the stealthy nature of the backdoor attack. With extensive experiments on rich-resource and few-shot text classification tasks, we empirically validate ProAttack's competitive performance in textual backdoor attacks. Notably, in the rich-resource setting, ProAttack achieves state-of-the-art attack success rates in the clean-label backdoor attack benchmark without external triggers. All data and code used in our models are publically available\footnote{\url{https://github.com/shuaizhao95/Prompt_attack}}.

翻译：基于提示的学习范式弥合了预训练与微调之间的差距，在多种自然语言处理任务中（尤其是在小样本场景下）取得了最先进的性能。尽管被广泛应用，基于提示的学习易受后门攻击。文本后门攻击旨在通过触发器注入和标签修改的方式污染部分训练样本，从而向模型中引入目标性脆弱性。然而，此类攻击存在缺陷，例如由触发器导致的异常自然语言表达以及被污染样本的标签错误。在本研究中，我们提出了**ProAttack**，一种基于提示的干净标签后门攻击的新颖高效方法，该方法直接使用提示本身作为触发器。我们的方法无需外部触发器，并确保被污染样本的标签正确性，从而提升了后门攻击的隐蔽性。通过在丰富资源和小样本文本分类任务上的广泛实验，我们实证验证了ProAttack在文本后门攻击中的竞争性能。值得注意的是，在丰富资源场景下，ProAttack在无需外部触发器的干净标签后门攻击基准测试中达到了最先进的攻击成功率。我们模型中使用的所有数据和代码均已公开提供\footnote{\url{https://github.com/shuaizhao95/Prompt_attack}}。

0

相关内容

Performer

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

47+阅读 · 2020年10月31日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

最新《人脸识别对抗攻击》综述 | Threat of Adversarial Attacks on Face Recognition: A Comprehensive Survey

最新《人脸识别对抗攻击》综述 | Threat of Adversarial Attacks on Face Recognition: A Comprehensive Survey

专知会员服务

26+阅读 · 2020年7月24日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

80+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

106+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

AINLP

10+阅读 · 2019年2月9日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

【推荐】用TensorFlow实现LSTM社交对话股市情感分析

【推荐】用TensorFlow实现LSTM社交对话股市情感分析

机器学习研究会

11+阅读 · 2018年1月14日

小麦铜转运蛋白TaCT1在干旱胁迫响应和条锈病抗性过程中的功能和分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

脂筏相关蛋白β-adducin调控PSGL-1介导的中性粒细胞起始黏附的作用和机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

浸润性T淋巴细胞表达IRF-7对骨性关节炎微环境的调控作用与补肾活血中药干预的研究

国家自然科学基金

0+阅读 · 2014年12月31日

HIF-1α对IgG免疫复合物诱导巨噬细胞炎症反应的调控作用

国家自然科学基金

1+阅读 · 2013年12月31日

NLRP3炎症小体介导炎性微环境对舌癌干细胞形成的调控作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

NLRP3炎症小体及相关信号通路介导钩端螺旋体感染性炎症反应机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

TAM/Gas6在石英粉尘致炎性反应及纤维化中的作用及机制

国家自然科学基金

0+阅读 · 2012年12月31日

雷帕霉素复合物1在巨噬细胞炎症反应中的作用与机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

He和H离子注入Si基材料引起的表面剥离及机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

Nalp3炎性体在石英粉尘致纤维化中的作用及机制

国家自然科学基金

0+阅读 · 2009年12月31日

COVER: A Heuristic Greedy Adversarial Attack on Prompt-based Learning in Language Models

Arxiv

0+阅读 · 2023年6月14日

Efficient Backdoor Attacks for Deep Neural Networks in Real-world Scenarios

Arxiv

0+阅读 · 2023年6月14日

Multi-target Backdoor Attacks for Code Pre-trained Models

Arxiv

0+阅读 · 2023年6月14日

A Proxy-Free Strategy for Practically Improving the Poisoning Efficiency in Backdoor Attacks

Arxiv

0+阅读 · 2023年6月14日

PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts

Arxiv

0+阅读 · 2023年6月13日

Privacy Inference-Empowered Stealthy Backdoor Attack on Federated Learning under Non-IID Scenarios

Arxiv

0+阅读 · 2023年6月13日

I See Dead People: Gray-Box Adversarial Attack on Image-To-Text Models

Arxiv

0+阅读 · 2023年6月13日

VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion Models

Arxiv

0+阅读 · 2023年6月13日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

Backdoor Learning: A Survey

Arxiv

15+阅读 · 2020年10月26日

VIP会员

文章信息

相关主题

state-of-the-art

语言模型化

最新内容

从采集到决策：美军视角下的战术情报范式重构

从采集到决策：美军视角下的战术情报范式重构

专知会员服务

1+阅读 · 今天2:42

乌克兰“德尔塔”系统揭示无人机、数据与领导力如何重塑现代安全格局

乌克兰“德尔塔”系统揭示无人机、数据与领导力如何重塑现代安全格局

专知会员服务

1+阅读 · 今天2:37

大规模作战中的参谋流程：作为联合兵种作战组成部分的目标锁定

大规模作战中的参谋流程：作为联合兵种作战组成部分的目标锁定

专知会员服务

2+阅读 · 今天2:23

《北约概念开发与实验（CD&E）手册：概念开发者工具箱》100页手册

《北约概念开发与实验（CD&E）手册：概念开发者工具箱》100页手册

专知会员服务

5+阅读 · 今天2:21

《履带式无人地面战车技术发展现状》

《履带式无人地面战车技术发展现状》

专知会员服务

2+阅读 · 今天1:46

《美国空军B-2“幽灵”隐身轰炸机系统工程案例研究》117页

《美国空军B-2“幽灵”隐身轰炸机系统工程案例研究》117页

专知会员服务

5+阅读 · 8月1日

隐身技术前沿综述：物理机理、工程实践与战略展望

隐身技术前沿综述：物理机理、工程实践与战略展望

专知会员服务

4+阅读 · 8月1日

《多变海洋环境下无人水面艇与自主水下机器人对接的最优路径规划》

《多变海洋环境下无人水面艇与自主水下机器人对接的最优路径规划》

专知会员服务

4+阅读 · 8月1日

《以机反机：基于无人机载麦克风的空中周界入侵检测》

《以机反机：基于无人机载麦克风的空中周界入侵检测》

专知会员服务

4+阅读 · 8月1日

《无人机脆弱性利用：网络空间力量的新域》

《无人机脆弱性利用：网络空间力量的新域》

专知会员服务

2+阅读 · 8月1日

美空军如何将人工智能从战场部署至后方机关

美空军如何将人工智能从战场部署至后方机关

专知会员服务

11+阅读 · 7月31日

《美战争部指令文件：网络空间效应与使能能力测试评估》

《美战争部指令文件：网络空间效应与使能能力测试评估》

专知会员服务

8+阅读 · 7月31日

《史诗怒火行动：多域前瞻评估》49页报告

《史诗怒火行动：多域前瞻评估》49页报告

专知会员服务

7+阅读 · 7月31日

《英国防部：未来空战系统数字化战略》33页

《英国防部：未来空战系统数字化战略》33页

专知会员服务

5+阅读 · 7月31日

《面向自主飞行网络的智能体人工智能架构》

《面向自主飞行网络的智能体人工智能架构》

专知会员服务

7+阅读 · 7月31日

相关VIP内容

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

47+阅读 · 2020年10月31日

2020数据工程师成长路线图

专知会员服务

41+阅读 · 2020年9月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

最新《人脸识别对抗攻击》综述 | Threat of Adversarial Attacks on Face Recognition: A Comprehensive Survey

最新《人脸识别对抗攻击》综述 | Threat of Adversarial Attacks on Face Recognition: A Comprehensive Survey

专知会员服务

26+阅读 · 2020年7月24日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

164+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

80+阅读 · 2019年10月10日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

106+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

乌克兰“德尔塔”系统揭示无人机、数据与领导力如何重塑现代安全格局

《北约概念开发与实验（CD&E）手册：概念开发者工具箱》100页手册

从采集到决策：美军视角下的战术情报范式重构

大规模作战中的参谋流程：作为联合兵种作战组成部分的目标锁定

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

NLP 2018 Highlights：2018自然语言处理技术亮点汇总

AINLP

10+阅读 · 2019年2月9日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

【推荐】用TensorFlow实现LSTM社交对话股市情感分析

【推荐】用TensorFlow实现LSTM社交对话股市情感分析

机器学习研究会

11+阅读 · 2018年1月14日

相关论文

COVER: A Heuristic Greedy Adversarial Attack on Prompt-based Learning in Language Models

Arxiv

0+阅读 · 2023年6月14日

Efficient Backdoor Attacks for Deep Neural Networks in Real-world Scenarios

Arxiv

0+阅读 · 2023年6月14日

Multi-target Backdoor Attacks for Code Pre-trained Models

Arxiv

0+阅读 · 2023年6月14日

A Proxy-Free Strategy for Practically Improving the Poisoning Efficiency in Backdoor Attacks

Arxiv

0+阅读 · 2023年6月14日

PromptBench: Towards Evaluating the Robustness of Large Language Models on Adversarial Prompts

Arxiv

0+阅读 · 2023年6月13日

Privacy Inference-Empowered Stealthy Backdoor Attack on Federated Learning under Non-IID Scenarios

Arxiv

0+阅读 · 2023年6月13日

I See Dead People: Gray-Box Adversarial Attack on Image-To-Text Models

Arxiv

0+阅读 · 2023年6月13日

VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion Models

Arxiv

0+阅读 · 2023年6月13日

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Arxiv

30+阅读 · 2021年7月28日

Backdoor Learning: A Survey

Arxiv

15+阅读 · 2020年10月26日

相关基金

小麦铜转运蛋白TaCT1在干旱胁迫响应和条锈病抗性过程中的功能和分子机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

脂筏相关蛋白β-adducin调控PSGL-1介导的中性粒细胞起始黏附的作用和机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

浸润性T淋巴细胞表达IRF-7对骨性关节炎微环境的调控作用与补肾活血中药干预的研究

国家自然科学基金

0+阅读 · 2014年12月31日

HIF-1α对IgG免疫复合物诱导巨噬细胞炎症反应的调控作用

国家自然科学基金

1+阅读 · 2013年12月31日

NLRP3炎症小体介导炎性微环境对舌癌干细胞形成的调控作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

NLRP3炎症小体及相关信号通路介导钩端螺旋体感染性炎症反应机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

TAM/Gas6在石英粉尘致炎性反应及纤维化中的作用及机制

国家自然科学基金

0+阅读 · 2012年12月31日

雷帕霉素复合物1在巨噬细胞炎症反应中的作用与机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

He和H离子注入Si基材料引起的表面剥离及机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

Nalp3炎性体在石英粉尘致纤维化中的作用及机制

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员