Universal Image Immunization against Diffusion-based Image Editing via Semantic Injection

Recent advances in diffusion models have enabled powerful image editing capabilities guided by natural language prompts, unlocking new creative possibilities. However, they introduce significant ethical and legal risks, such as deepfakes and unauthorized use of copyrighted visual content. To address these risks, image immunization has emerged as a promising defense against AI-driven semantic manipulation. Yet, most existing approaches rely on image-specific adversarial perturbations that require individual optimization for each image, thereby limiting scalability and practicality. In this paper, we propose the first universal image immunization framework that generates a single, broadly applicable adversarial perturbation specifically designed for diffusion-based editing pipelines. Inspired by universal adversarial perturbation (UAP) techniques used in targeted attacks, our method generates a UAP that embeds a semantic target into images to be protected. Simultaneously, it suppresses original content to effectively misdirect the model's attention during editing. As a result, our approach effectively blocks malicious editing attempts by overwriting the original semantic content in the image via the UAP. Moreover, our method operates effectively even in data-free settings without requiring access to training data or domain knowledge, further enhancing its practicality and broad applicability in real-world scenarios. Extensive experiments show that our method, as the first universal immunization approach, significantly outperforms several baselines in the UAP setting. In addition, despite the inherent difficulty of universal perturbations, our method also achieves performance on par with image-specific methods under a more restricted perturbation budget, while also exhibiting strong black-box transferability across different diffusion models.

翻译：近年来，扩散模型的进展使得基于自然语言提示的强大图像编辑成为可能，开辟了新的创作空间。然而，这些技术也带来了严重的伦理和法律风险，例如深度伪造和对受版权保护的视觉内容的未经授权使用。为应对这些风险，图像免疫已成为一种有前景的防御手段，以对抗人工智能驱动的语义篡改。然而，现有方法大多依赖于针对特定图像的对抗性扰动，需要对每张图像进行单独优化，这限制了方法的可扩展性和实用性。本文提出了首个通用图像免疫框架，能够生成单一、广泛适用的对抗性扰动，专门针对基于扩散的编辑流程而设计。受用于定向攻击的通用对抗扰动（UAP）技术启发，我们的方法生成一种UAP，将语义目标嵌入待保护的图像中。同时，它抑制原始内容，以在编辑过程中有效误导模型的注意力。因此，我们的方法通过UAP覆盖图像中的原始语义内容，从而有效阻断恶意编辑企图。此外，我们的方法即使在无数据设置下也能有效运行，无需访问训练数据或领域知识，这进一步增强了其在现实场景中的实用性和广泛适用性。大量实验表明，作为首个通用免疫方法，我们的方法在UAP设置下显著优于多个基线方法。此外，尽管通用扰动本身具有固有难度，但在更严格的扰动预算下，我们的方法仍能达到与图像特定方法相当的性能，同时在不同扩散模型之间展现出强大的黑盒可迁移性。