How to Backdoor Diffusion Models?

Diffusion models are state-of-the-art deep learning empowered generative models that are trained based on the principle of learning forward and reverse diffusion processes via progressive noise-addition and denoising. To gain a better understanding of the limitations and potential risks, this paper presents the first study on the robustness of diffusion models against backdoor attacks. Specifically, we propose BadDiffusion, a novel attack framework that engineers compromised diffusion processes during model training for backdoor implantation. At the inference stage, the backdoored diffusion model will behave just like an untampered generator for regular data inputs, while falsely generating some targeted outcome designed by the bad actor upon receiving the implanted trigger signal. Such a critical risk can be dreadful for downstream tasks and applications built upon the problematic model. Our extensive experiments on various backdoor attack settings show that BadDiffusion can consistently lead to compromised diffusion models with high utility and target specificity. Even worse, BadDiffusion can be made cost-effective by simply finetuning a clean pre-trained diffusion model to implant backdoors. We also explore some possible countermeasures for risk mitigation. Our results call attention to potential risks and possible misuse of diffusion models. Our code is available on https://github.com/IBM/BadDiffusion.

翻译：扩散模型是基于学习前向和反向扩散过程（通过逐步添加噪声和去噪）的最先进的深度学习生成模型。为深入理解其局限性和潜在风险，本文首次针对扩散模型在后门攻击下的鲁棒性展开研究。具体而言，我们提出了一种新型攻击框架BadDiffusion，该框架通过在模型训练阶段设计受篡改的扩散过程来实现后门植入。在推理阶段，被植入后门的扩散模型对常规数据输入会表现得如同未受篡改的生成器，但一旦接收到攻击者预设的触发信号，便会错误生成目标输出。此类关键风险将严重威胁基于该问题模型的下游任务与应用。我们在多种后门攻击设置下进行的广泛实验表明，BadDiffusion能够持续生成兼具高实用性与高目标特异性（即仅针对触发信号产生特定输出）的受篡改扩散模型。更甚者，仅需对预训练的纯净扩散模型进行简单微调即可实现低成本的BadDiffusion后门植入。我们还探索了若干潜在的风险缓解对策。本研究结果警示人们关注扩散模型的潜在风险与可能滥用。相关代码已开源至https://github.com/IBM/BadDiffusion。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

最新《Transformers模型》教程，64页ppt

专知会员服务

326+阅读 · 2020年11月26日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日