As diffusion models have achieved success in image generation tasks, many studies have extended them to other related fields like image editing. Unlike image generation, image editing aims to modify an image based on user requests while keeping other parts of the image unchanged. Among these, text-based image editing is the most representative task.Some studies have shown that diffusion models are vulnerable to backdoor attacks, where attackers may poison the training data to inject the backdoor into models. However, previous backdoor attacks on diffusion models primarily focus on image generation models without considering image editing models. Given that image editing models accept multimodal inputs, it raises a new question regarding the effectiveness of different modalities triggers in backdoor attacks on these models. To address this question, we propose a backdoor attack framework for image editing models, named TrojanEdit, which can handle different modalities triggers. We explore five types of visual triggers, three types of textual triggers, and combine them together as fifteen types of multimodal triggers, conducting extensive experiments for three types of backdoor attack goals. Our experimental results show that the image editing model has a backdoor bias for texture triggers. Compared to visual triggers, textual triggers have stronger attack effectiveness but also cause more damage to the model's normal functionality. Furthermore, we found that multimodal triggers can achieve a good balance between the attack effectiveness and model's normal functionality.
翻译:随着扩散模型在图像生成任务中取得成功,许多研究将其扩展到图像编辑等其他相关领域。与图像生成不同,图像编辑旨在根据用户请求修改图像,同时保持图像其他部分不变。其中,基于文本的图像编辑是最具代表性的任务。已有研究表明扩散模型易受后门攻击,攻击者可能通过污染训练数据将后门注入模型。然而,以往对扩散模型的后门攻击主要集中于图像生成模型,未考虑图像编辑模型。鉴于图像编辑模型接受多模态输入,这引发了一个新问题:在多模态触发器的后门攻击中,不同模态触发器的有效性如何?为解答此问题,我们提出了一种针对图像编辑模型的后门攻击框架,命名为TrojanEdit,该框架能够处理不同模态的触发器。我们探索了五种视觉触发器、三种文本触发器,并将它们组合为十五种多模态触发器,针对三类后门攻击目标进行了广泛实验。实验结果表明,图像编辑模型对纹理触发器存在后门偏好。与视觉触发器相比,文本触发器具有更强的攻击效果,但也会对模型的正常功能造成更大损害。此外,我们发现多模态触发器能在攻击效果与模型正常功能之间实现良好平衡。