The rapid advancement of AI-Generated Content (AIGC) technologies poses significant challenges for authenticity assessment. However, existing evaluation protocols largely overlook anti-forensics attack, failing to ensure the comprehensive robustness of state-of-the-art AIGC detectors in real-world applications. To bridge this gap, we propose ForgeryEraser, a framework designed to execute universal anti-forensics attack without access to the target AIGC detectors. We reveal an adversarial vulnerability stemming from the systemic reliance on Vision-Language Models (VLMs) as shared backbones (e.g., CLIP), where downstream AIGC detectors inherit the feature space of these publicly accessible models. Instead of traditional logit-based optimization, we design a multi-modal guidance loss to drive forged image embeddings within the VLM feature space toward text-derived authentic anchors to erase forgery traces, while repelling them from forgery anchors. Extensive experiments demonstrate that ForgeryEraser causes substantial performance degradation to advanced AIGC detectors on both global synthesis and local editing benchmarks. Moreover, ForgeryEraser induces explainable forensic models to generate explanations consistent with authentic images for forged images. Our code will be made publicly available.
翻译:人工智能生成内容(AIGC)技术的快速发展给真实性评估带来了重大挑战。然而,现有的评估方案在很大程度上忽视了反取证攻击,未能确保最先进的AIGC检测器在现实应用中的全面鲁棒性。为弥补这一差距,我们提出了ForgeryEraser框架,该框架旨在无需访问目标AIGC检测器的情况下执行通用反取证攻击。我们揭示了一种源于系统依赖视觉-语言模型(VLM)作为共享骨干网络(例如CLIP)的对抗性漏洞,下游AIGC检测器继承了这些公开可访问模型的特征空间。我们摒弃了传统的基于逻辑值的优化方法,设计了一种多模态引导损失函数,以驱动伪造图像嵌入在VLM特征空间中向文本导出的真实锚点靠拢,从而消除伪造痕迹,同时使其远离伪造锚点。大量实验表明,ForgeryEraser在全局合成和局部编辑基准测试中,均导致先进的AIGC检测器性能显著下降。此外,ForgeryEraser能够促使可解释的取证模型为伪造图像生成与真实图像一致的解释。我们的代码将公开提供。