Vulnerabilities in AI-generated Image Detection: The Challenge of Adversarial Attacks

Recent advancements in image synthesis, particularly with the advent of GAN and Diffusion models, have amplified public concerns regarding the dissemination of disinformation. To address such concerns, numerous AI-generated Image (AIGI) Detectors have been proposed and achieved promising performance in identifying fake images. However, there still lacks a systematic understanding of the adversarial robustness of AIGI detectors. In this paper, we examine the vulnerability of state-of-the-art AIGI detectors against adversarial attack under white-box and black-box settings, which has been rarely investigated so far. To this end, we propose a new method to attack AIGI detectors. First, inspired by the obvious difference between real images and fake images in the frequency domain, we add perturbations under the frequency domain to push the image away from its original frequency distribution. Second, we explore the full posterior distribution of the surrogate model to further narrow this gap between heterogeneous AIGI detectors, e.g., transferring adversarial examples across CNNs and ViTs. This is achieved by introducing a novel post-train Bayesian strategy that turns a single surrogate into a Bayesian one, capable of simulating diverse victim models using one pre-trained surrogate, without the need for re-training. We name our method as Frequency-based Post-train Bayesian Attack, or FPBA. Through FPBA, we demonstrate that adversarial attacks pose a real threat to AIGI detectors. FPBA can deliver successful black-box attacks across various detectors, generators, defense methods, and even evade cross-generator and compressed image detection, which are crucial real-world detection scenarios. Our code is available at https://github.com/onotoa/fpba.

翻译：近年来，图像合成技术的进步，特别是GAN和Diffusion模型的出现，加剧了公众对虚假信息传播的担忧。为应对此类担忧，众多AI生成图像检测器被提出，并在识别伪造图像方面取得了良好性能。然而，目前对AIGI检测器对抗鲁棒性的系统性理解仍显不足。本文研究了最先进的AIGI检测器在白盒与黑盒设置下对抗性攻击的脆弱性，该问题迄今鲜有探讨。为此，我们提出了一种攻击AIGI检测器的新方法。首先，受真实图像与伪造图像在频域存在显著差异的启发，我们在频域添加扰动以使图像远离其原始频域分布。其次，我们探索代理模型的全后验分布，以进一步缩小异构AIGI检测器之间的差距，例如实现对抗样本在CNN与ViT之间的跨模型迁移。这通过引入一种新颖的后训练贝叶斯策略实现，该策略将单一代理模型转化为贝叶斯模型，能够仅使用一个预训练代理模拟多样化的受害模型，而无需重新训练。我们将该方法命名为基于频域的后训练贝叶斯攻击。通过FPBA，我们证明了对抗性攻击对AIGI检测器构成实际威胁。FPBA能够对各类检测器、生成器、防御方法实现成功的黑盒攻击，甚至能规避跨生成器检测与压缩图像检测——这些均是现实世界检测的关键场景。我们的代码公开于https://github.com/onotoa/fpba。