Diffusion-Guided Adversarial Perturbation Injection for Generalizable Defense Against Facial Manipulations

Recent advances in GAN and diffusion models have significantly improved the realism and controllability of facial deepfake manipulation, raising serious concerns regarding privacy, security, and identity misuse. Proactive defenses attempt to counter this threat by injecting adversarial perturbations into images before manipulation takes place. However, existing approaches remain limited in effectiveness due to suboptimal perturbation injection strategies and are typically designed under white-box assumptions, targeting only simple GAN-based attribute editing. These constraints hinder their applicability in practical real-world scenarios. In this paper, we propose AEGIS, the first diffusion-guided paradigm in which the AdvErsarial facial images are Generated for Identity Shielding. We observe that the limited defense capability of existing approaches stems from the peak-clipping constraint, where perturbations are forcibly truncated due to a fixed $L_\infty$-bounded. To overcome this limitation, instead of directly modifying pixels, AEGIS injects adversarial perturbations into the latent space along the DDIM denoising trajectory, thereby decoupling the perturbation magnitude from pixel-level constraints and allowing perturbations to adaptively amplify where most effective. The extensible design of AEGIS allows the defense to be expanded from purely white-box use to also support black-box scenarios through a gradient-estimation strategy. Extensive experiments across GAN and diffusion-based deepfake generators show that AEGIS consistently delivers strong defense effectiveness while maintaining high perceptual quality. In white-box settings, it achieves robust manipulation disruption, whereas in black-box settings, it demonstrates strong cross-model transferability.

翻译：近年来，生成对抗网络与扩散模型的显著进展大幅提升了人脸深度伪造的逼真度与可控性，引发了对隐私、安全及身份滥用的严重担忧。主动防御方法试图通过在篡改发生前向图像中注入对抗扰动来应对这一威胁。然而，现有方法因扰动注入策略欠优化而效果有限，且通常基于白盒假设设计，仅能应对基于生成对抗网络的简单属性编辑，这些局限性阻碍了其在实际场景中的应用。本文提出AEGIS——首个以扩散模型引导的对抗人脸图像生成范式，旨在实现身份防护。我们观察到，现有方法防御能力受限源于峰值截断约束：扰动因固定的$L_\infty$界而被强制截断。为克服此限制，AEGIS不直接修改像素，而是沿DDIM去噪轨迹将对抗扰动注入潜空间，从而解耦扰动幅度与像素级约束，使扰动能在最有效区域自适应增强。AEGIS的可扩展设计通过梯度估计策略将防御范围从纯白盒场景扩展至支持黑盒场景。跨基于生成对抗网络与扩散模型的深度伪造生成器的广泛实验表明，AEGIS在保持高感知质量的同时持续展现强防御效能：在白盒场景中实现稳健的篡改破坏，在黑盒场景中则展现强跨模型迁移性。