The rise in popularity of text-to-image generative artificial intelligence (AI) has attracted widespread public interest. We demonstrate that this technology can be attacked to generate content that subtly manipulates its users. We propose a Backdoor Attack on text-to-image Generative Models (BAGM), which upon triggering, infuses the generated images with manipulative details that are naturally blended in the content. Our attack is the first to target three popular text-to-image generative models across three stages of the generative process by modifying the behaviour of the embedded tokenizer, the language model or the image generative model. Based on the penetration level, BAGM takes the form of a suite of attacks that are referred to as surface, shallow and deep attacks in this article. Given the existing gap within this domain, we also contribute a comprehensive set of quantitative metrics designed specifically for assessing the effectiveness of backdoor attacks on text-to-image models. The efficacy of BAGM is established by attacking state-of-the-art generative models, using a marketing scenario as the target domain. To that end, we contribute a dataset of branded product images. Our embedded backdoors increase the bias towards the target outputs by more than five times the usual, without compromising the model robustness or the generated content utility. By exposing generative AI's vulnerabilities, we encourage researchers to tackle these challenges and practitioners to exercise caution when using pre-trained models. Relevant code, input prompts and supplementary material can be found at https://github.com/JJ-Vice/BAGM, and the dataset is available at: https://ieee-dataport.org/documents/marketable-foods-mf-dataset. Keywords: Generative Artificial Intelligence, Generative Models, Text-to-Image generation, Backdoor Attacks, Trojan, Stable Diffusion.
翻译:文本到图像生成式人工智能(AI)的流行吸引了公众的广泛关注。我们证明,该技术可能遭受攻击,从而生成微妙地操控用户的内容。我们提出一种针对文本到图像生成模型的后门攻击方法(BAGM),该方法在被触发时,会在生成的图像中注入与内容自然融合的操纵性细节。我们的攻击首次针对三种流行的文本到图像生成模型,通过修改嵌入分词器、语言模型或图像生成模型的行为,覆盖生成过程的三个阶段。根据渗透程度,BAGM表现为一系列攻击形式,本文中分别称为表面攻击、浅层攻击和深层攻击。鉴于该领域的现有空白,我们还贡献了一套专门用于评估文本到图像模型后门攻击效果的综合性定量指标。通过以营销场景为目标域,攻击最先进的生成模型,验证了BAGM的有效性。为此,我们贡献了一个品牌产品图像数据集。我们植入的后门将模型对目标输出的偏向性提升至常规水平的五倍以上,同时不损害模型鲁棒性或生成内容的有用性。通过揭示生成式AI的脆弱性,我们鼓励研究人员应对这些挑战,并提醒从业者在使用预训练模型时保持谨慎。相关代码、输入提示和补充材料见https://github.com/JJ-Vice/BAGM,数据集见https://ieee-dataport.org/documents/marketable-foods-mf-dataset。关键词:生成式人工智能,生成模型,文本到图像生成,后门攻击,特洛伊木马,Stable Diffusion。