BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models

from arxiv, This research was supported by National Intelligence and Security Discovery Research Grants (project\# NS220100007), funded by the Department of Defence Australia

The rise in popularity of text-to-image generative artificial intelligence (AI) has attracted widespread public interest. At the same time, backdoor attacks are well-known in machine learning literature for their effective manipulation of neural models, which is a growing concern among practitioners. We highlight this threat for generative AI by introducing a Backdoor Attack on text-to-image Generative Models (BAGM). Our attack targets various stages of the text-to-image generative pipeline, modifying the behaviour of the embedded tokenizer and the pre-trained language and visual neural networks. Based on the penetration level, BAGM takes the form of a suite of attacks that are referred to as surface, shallow and deep attacks in this article. We compare the performance of BAGM to recently emerging related methods. We also contribute a set of quantitative metrics for assessing the performance of backdoor attacks on generative AI models in the future. The efficacy of the proposed framework is established by targeting the state-of-the-art stable diffusion pipeline in a digital marketing scenario as the target domain. To that end, we also contribute a Marketable Foods dataset of branded product images. We hope this work contributes towards exposing the contemporary generative AI security challenges and fosters discussions on preemptive efforts for addressing those challenges. Keywords: Generative Artificial Intelligence, Generative Models, Text-to-Image generation, Backdoor Attacks, Trojan, Stable Diffusion.

翻译：文本到图像生成式人工智能（AI）的日益普及引发了广泛的社会关注。与此同时，后门攻击因其能有效操控神经模型而成为机器学习文献中众所周知的威胁，这正引起从业者日益增长的担忧。我们通过提出针对文本到图像生成模型的后门攻击（BAGM）来强调生成式AI面临的这一威胁。我们的攻击针对文本到图像生成管道的不同阶段，修改嵌入分词器以及预训练语言和视觉神经网络的行为。根据渗透层级，BAGM形成了一系列攻击方法，本文中分别称为表面攻击、浅层攻击和深层攻击。我们将BAGM与近期涌现的相关方法进行了性能比较。我们还贡献了一套定量指标，用于未来评估针对生成式AI模型后门攻击的性能。通过在目标领域（数字营销场景）中针对当前最先进的稳定扩散管道进行实证，验证了所提出框架的有效性。为此，我们还贡献了包含品牌产品图像的市场化食品数据集。我们希望这项工作有助于揭示当代生成式AI的安全挑战，并促进针对这些挑战的预防性努力讨论。关键词：生成式人工智能，生成模型，文本到图像生成，后门攻击，特洛伊木马，稳定扩散。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日