Text-to-image diffusion models can generate realistic images based on textual inputs, enabling users to convey their opinions visually through language. Meanwhile, within language, emotion plays a crucial role in expressing personal opinions in our daily lives and the inclusion of maliciously negative content can lead users astray, exacerbating negative emotions. Recognizing the success of diffusion models and the significance of emotion, we investigate a previously overlooked risk associated with text-to-image diffusion models, that is, utilizing emotion in the input texts to introduce negative content and provoke unfavorable emotions in users. Specifically, we identify a new backdoor attack, i.e., emotion-aware backdoor attack (EmoAttack), which introduces malicious negative content triggered by emotional texts during image generation. We formulate such an attack as a diffusion personalization problem to avoid extensive model retraining and propose the EmoBooth. Unlike existing personalization methods, our approach fine-tunes a pre-trained diffusion model by establishing a mapping between a cluster of emotional words and a given reference image containing malicious negative content. To validate the effectiveness of our method, we built a dataset and conducted extensive analysis and discussion about its effectiveness. Given consumers' widespread use of diffusion models, uncovering this threat is critical for society.
翻译:文本到图像扩散模型能够根据文本输入生成逼真的图像,使用户能够通过语言以视觉方式表达其观点。与此同时,在语言中,情感在日常生活中表达个人观点时起着至关重要的作用,而包含恶意负面内容可能误导用户,加剧负面情绪。鉴于扩散模型取得的成功以及情感的重要性,我们研究了一种先前被忽视的与文本到图像扩散模型相关的风险,即利用输入文本中的情感引入负面内容并引发用户的不良情绪。具体而言,我们识别出一种新型后门攻击——情感感知后门攻击(EmoAttack),该攻击在图像生成过程中由情感文本触发恶意负面内容。我们将此类攻击形式化为扩散模型个性化问题,以避免大量模型重新训练,并提出了EmoBooth方法。与现有个性化方法不同,我们的方法通过建立一组情感词与包含恶意负面内容的给定参考图像之间的映射,对预训练的扩散模型进行微调。为验证方法的有效性,我们构建了一个数据集,并对其效果进行了广泛的分析与讨论。鉴于扩散模型在消费者中的广泛使用,揭示这一威胁对社会至关重要。