Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-Level Backdoor Attacks

Pre-trained models (PTMs) have been widely used in various downstream tasks. The parameters of PTMs are distributed on the Internet and may suffer backdoor attacks. In this work, we demonstrate the universal vulnerability of PTMs, where fine-tuned PTMs can be easily controlled by backdoor attacks in arbitrary downstream tasks. Specifically, attackers can add a simple pre-training task, which restricts the output representations of trigger instances to pre-defined vectors, namely neuron-level backdoor attack (NeuBA). If the backdoor functionality is not eliminated during fine-tuning, the triggers can make the fine-tuned model predict fixed labels by pre-defined vectors. In the experiments of both natural language processing (NLP) and computer vision (CV), we show that NeuBA absolutely controls the predictions for trigger instances without any knowledge of downstream tasks. Finally, we apply several defense methods to NeuBA and find that model pruning is a promising direction to resist NeuBA by excluding backdoored neurons. Our findings sound a red alarm for the wide use of PTMs. Our source code and models are available at \url{https://github.com/thunlp/NeuBA}.

翻译：预训练模型（PTMs）已广泛应用于各类下游任务中。由于PTM的参数通过互联网分发，可能遭受后门攻击。本文揭示了PTM的普遍脆弱性：微调后的PTM可在任意下游任务中被后门攻击轻易控制。具体而言，攻击者可添加一个简单的预训练任务，将触发实例的输出表示限制为预定义向量，即神经元级后门攻击（NeuBA）。若微调过程中后门功能未被消除，触发机制可通过预定义向量使微调模型预测固定标签。在自然语言处理（NLP）和计算机视觉（CV）实验中，我们证明NeuBA能在无需了解下游任务的情况下完全控制触发实例的预测结果。最后，我们针对NeuBA应用多种防御方法，发现模型剪枝通过剔除包含后门的神经元成为抵御NeuBA的有效方向。我们的发现为PTM的广泛使用敲响了红色警报。源代码和模型已发布于\url{https://github.com/thunlp/NeuBA}。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日