Text-guided image editing via diffusion models, while powerful, raises significant concerns about misuse, motivating efforts to immunize images against unauthorized edits using imperceptible perturbations. Prevailing metrics for evaluating immunization success typically rely on measuring the visual dissimilarity between the output generated from a protected image and a reference output generated from the unprotected original. This approach fundamentally overlooks the core requirement of image immunization, which is to disrupt semantic alignment with attacker intent, regardless of deviation from any specific output. We argue that immunization success should instead be defined by the edited output either semantically mismatching the prompt or suffering substantial perceptual degradations, both of which thwart malicious intent. To operationalize this principle, we propose Synergistic Intermediate Feature Manipulation (SIFM), a method that strategically perturbs intermediate diffusion features through dual synergistic objectives: (1) maximizing feature divergence from the original edit trajectory to disrupt semantic alignment with the expected edit, and (2) minimizing feature norms to induce perceptual degradations. Furthermore, we introduce the Immunization Success Rate (ISR), a novel metric designed to rigorously quantify true immunization efficacy for the first time. ISR quantifies the proportion of edits where immunization induces either semantic failure relative to the prompt or significant perceptual degradations, assessed via Multimodal Large Language Models (MLLMs). Extensive experiments show our SIFM achieves the state-of-the-art performance for safeguarding visual content against malicious diffusion-based manipulation.
翻译:基于扩散模型的文本引导图像编辑技术虽功能强大,却引发了严重的滥用担忧,这促使研究者通过引入不可察觉的扰动来实现图像对未授权编辑的免疫。当前评估免疫效果的主流指标通常依赖于测量受保护图像生成的输出与未受保护原始图像生成的参考输出之间的视觉差异。这种方法从根本上忽视了图像免疫的核心要求,即无论与任何特定输出的偏离程度如何,都应破坏与攻击者意图的语义对齐。我们认为,免疫成功应重新定义为:编辑后的输出要么与提示语义失配,要么遭受显著的感知退化,这两种情况均可挫败恶意意图。为实践这一原则,我们提出协同中间特征操纵(SIFM),该方法通过双重协同目标策略性地扰动扩散中间特征:(1)最大化特征与原始编辑轨迹的差异,以破坏与预期编辑的语义对齐;(2)最小化特征范数以诱发感知退化。此外,我们首次提出免疫成功率(ISR)这一新颖指标,旨在严格量化真实的免疫效能。ISR通过多模态大语言模型(MLLMs)评估,量化了免疫导致编辑结果相对于提示语义失败或产生显著感知退化的比例。大量实验表明,我们的SIFM在保护视觉内容免受基于扩散的恶意操纵方面实现了最先进的性能。