Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models

Data poisoning attacks manipulate training data to introduce unexpected behaviors into machine learning models at training time. For text-to-image generative models with massive training datasets, current understanding of poisoning attacks suggests that a successful attack would require injecting millions of poison samples into their training pipeline. In this paper, we show that poisoning attacks can be successful on generative models. We observe that training data per concept can be quite limited in these models, making them vulnerable to prompt-specific poisoning attacks, which target a model's ability to respond to individual prompts. We introduce Nightshade, an optimized prompt-specific poisoning attack where poison samples look visually identical to benign images with matching text prompts. Nightshade poison samples are also optimized for potency and can corrupt an Stable Diffusion SDXL prompt in <100 poison samples. Nightshade poison effects "bleed through" to related concepts, and multiple attacks can composed together in a single prompt. Surprisingly, we show that a moderate number of Nightshade attacks can destabilize general features in a text-to-image generative model, effectively disabling its ability to generate meaningful images. Finally, we propose the use of Nightshade` and similar tools as a last defense for content creators against web scrapers that ignore opt-out/do-not-crawl directives, and discuss possible implications for model trainers and content creators.

翻译：数据投毒攻击通过在训练时操控训练数据，使机器学习模型产生预期外的行为。对于使用海量训练数据的文本到图像生成模型而言，当前的投毒攻击认知表明，成功攻击需要向训练流程注入数百万个投毒样本。本文证明投毒攻击对生成模型同样有效。我们观察到，这些模型中每个概念的训练数据可能极为有限，这使得它们容易受到特定提示词下的投毒攻击——此类攻击针对模型响应单个提示词的能力。我们提出Nightshade，一种优化的特定提示词投毒攻击方法，其投毒样本在视觉上与匹配文本提示词的良性图像完全一致。Nightshade投毒样本还针对效力进行了优化，可在不到100个投毒样本的情况下破坏Stable Diffusion SDXL的提示词生成效果。Nightshade的投毒效应会"渗透"至相关概念，且多种攻击可组合在同一提示词中。令人惊讶的是，我们发现中等数量的Nightshade攻击就能破坏文本到图像生成模型的通用特征，使其基本丧失生成有意义图像的能力。最后，我们将Nightshade及类似工具作为内容创作者针对无视拒绝爬取/禁止爬取指令的网络爬虫的最后防线，并探讨其对模型训练者和内容创作者的潜在影响。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日