The rapid advancement of AI-Generated Content (AIGC) technologies poses significant challenges for authenticity assessment. However, existing evaluation protocols largely overlook anti-forensics attack, failing to ensure the comprehensive robustness of state-of-the-art AIGC detectors in real-world applications. To bridge this gap, we propose ForgeryEraser, a framework designed to execute universal anti-forensics attack without access to the target AIGC detectors. We reveal an adversarial vulnerability stemming from the systemic reliance on Vision-Language Models (VLMs) as shared backbones (e.g., CLIP), where downstream AIGC detectors inherit the feature space of these publicly accessible models. Instead of traditional logit-based optimization, we design a multi-modal guidance loss to drive forged image embeddings within the VLM feature space toward text-derived authentic anchors to erase forgery traces, while repelling them from forgery anchors. Extensive experiments demonstrate that ForgeryEraser causes substantial performance degradation to advanced AIGC detectors on both global synthesis and local editing benchmarks. Moreover, ForgeryEraser induces explainable forensic models to generate explanations consistent with authentic images for forged images. Our code will be made publicly available.
翻译:人工智能生成内容(AIGC)技术的快速发展对真实性评估提出了重大挑战。然而,现有的评估方案大多忽视了反取证攻击,未能确保先进AIGC检测器在现实应用中的全面鲁棒性。为弥补这一差距,我们提出了ForgeryEraser框架,该框架旨在无需访问目标AIGC检测器的情况下执行通用反取证攻击。我们揭示了一种源于系统依赖视觉-语言模型(VLMs)作为共享主干(如CLIP)的对抗性漏洞,下游AIGC检测器继承了这些公开可访问模型的特征空间。不同于传统的基于逻辑的优化方法,我们设计了一种多模态引导损失,以驱动伪造图像嵌入在VLM特征空间内向文本导出的真实锚点靠拢以消除伪造痕迹,同时使其远离伪造锚点。大量实验表明,ForgeryEraser在全局合成与局部编辑基准测试中均导致先进AIGC检测器的性能显著下降。此外,ForgeryEraser能够促使可解释的取证模型为伪造图像生成与真实图像一致的解释。我们的代码将公开提供。