BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models

from arxiv, This research was supported by National Intelligence and Security Discovery Research Grants (project# NS220100007), funded by the Department of Defence Australia

The rise in popularity of text-to-image generative artificial intelligence (AI) has attracted widespread public interest. We demonstrate that this technology can be attacked to generate content that subtly manipulates its users. We propose a Backdoor Attack on text-to-image Generative Models (BAGM), which upon triggering, infuses the generated images with manipulative details that are naturally blended in the content. Our attack is the first to target three popular text-to-image generative models across three stages of the generative process by modifying the behaviour of the embedded tokenizer, the language model or the image generative model. Based on the penetration level, BAGM takes the form of a suite of attacks that are referred to as surface, shallow and deep attacks in this article. Given the existing gap within this domain, we also contribute a comprehensive set of quantitative metrics designed specifically for assessing the effectiveness of backdoor attacks on text-to-image models. The efficacy of BAGM is established by attacking state-of-the-art generative models, using a marketing scenario as the target domain. To that end, we contribute a dataset of branded product images. Our embedded backdoors increase the bias towards the target outputs by more than five times the usual, without compromising the model robustness or the generated content utility. By exposing generative AI's vulnerabilities, we encourage researchers to tackle these challenges and practitioners to exercise caution when using pre-trained models. Relevant code, input prompts and supplementary material can be found at https://github.com/JJ-Vice/BAGM, and the dataset is available at: https://ieee-dataport.org/documents/marketable-foods-mf-dataset. Keywords: Generative Artificial Intelligence, Generative Models, Text-to-Image generation, Backdoor Attacks, Trojan, Stable Diffusion.

翻译：文本到图像生成式人工智能（AI）的流行吸引了公众的广泛关注。我们证明，该技术可能遭受攻击，从而生成微妙地操控用户的内容。我们提出一种针对文本到图像生成模型的后门攻击方法（BAGM），该方法在被触发时，会在生成的图像中注入与内容自然融合的操纵性细节。我们的攻击首次针对三种流行的文本到图像生成模型，通过修改嵌入分词器、语言模型或图像生成模型的行为，覆盖生成过程的三个阶段。根据渗透程度，BAGM表现为一系列攻击形式，本文中分别称为表面攻击、浅层攻击和深层攻击。鉴于该领域的现有空白，我们还贡献了一套专门用于评估文本到图像模型后门攻击效果的综合性定量指标。通过以营销场景为目标域，攻击最先进的生成模型，验证了BAGM的有效性。为此，我们贡献了一个品牌产品图像数据集。我们植入的后门将模型对目标输出的偏向性提升至常规水平的五倍以上，同时不损害模型鲁棒性或生成内容的有用性。通过揭示生成式AI的脆弱性，我们鼓励研究人员应对这些挑战，并提醒从业者在使用预训练模型时保持谨慎。相关代码、输入提示和补充材料见https://github.com/JJ-Vice/BAGM，数据集见https://ieee-dataport.org/documents/marketable-foods-mf-dataset。关键词：生成式人工智能，生成模型，文本到图像生成，后门攻击，特洛伊木马，Stable Diffusion。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日