TransTroj: Transferable Backdoor Attacks to Pre-trained Models via Embedding Indistinguishability

Pre-trained models (PTMs) are extensively utilized in various downstream tasks. Adopting untrusted PTMs may suffer from backdoor attacks, where the adversary can compromise the downstream models by injecting backdoors into the PTM. However, existing backdoor attacks to PTMs can only achieve partially task-agnostic and the embedded backdoors are easily erased during the fine-tuning process. In this paper, we propose a novel transferable backdoor attack, TransTroj, to simultaneously meet functionality-preserving, durable, and task-agnostic. In particular, we first formalize transferable backdoor attacks as the indistinguishability problem between poisoned and clean samples in the embedding space. We decompose the embedding indistinguishability into pre- and post-indistinguishability, representing the similarity of the poisoned and reference embeddings before and after the attack. Then, we propose a two-stage optimization that separately optimizes triggers and victim PTMs to achieve embedding indistinguishability. We evaluate TransTroj on four PTMs and six downstream tasks. Experimental results show that TransTroj significantly outperforms SOTA task-agnostic backdoor attacks (18%$\sim$99%, 68% on average) and exhibits superior performance under various system settings. The code is available at https://github.com/haowang-cqu/TransTroj .

翻译：预训练模型（PTM）被广泛应用于各类下游任务。采用不可信的PTM可能遭受后门攻击，攻击者可通过向PTM中注入后门来破坏下游模型。然而，现有针对PTM的后门攻击仅能实现部分任务无关性，且嵌入的后门在微调过程中容易被消除。本文提出一种新型可迁移后门攻击方法TransTroj，可同时满足功能保持性、持久性和任务无关性。具体而言，我们首先将可迁移后门攻击形式化为嵌入空间中中毒样本与干净样本的不可区分性问题。我们将嵌入不可区分性分解为前不可区分性与后不可区分性，分别表征攻击前后中毒嵌入与参考嵌入的相似度。随后提出两阶段优化方法，分别优化触发器与受害PTM以实现嵌入不可区分性。我们在四个PTM和六个下游任务上评估了TransTroj，实验结果表明其性能显著优于现有最优任务无关后门攻击（提升幅度18%~99%，平均68%），并在多种系统设置下展现出优越性能。代码已开源至https://github.com/haowang-cqu/TransTroj。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日