Take Fake as Real: Realistic-like Robust Black-box Adversarial Attack to Evade AIGC Detection

The security of AI-generated content (AIGC) detection based on GANs and diffusion models is closely related to the credibility of multimedia content. Malicious adversarial attacks can evade these developing AIGC detection. However, most existing adversarial attacks focus only on GAN-generated facial images detection, struggle to be effective on multi-class natural images and diffusion-based detectors, and exhibit poor invisibility. To fill this gap, we first conduct an in-depth analysis of the vulnerability of AIGC detectors and discover the feature that detectors vary in vulnerability to different post-processing. Then, considering the uncertainty of detectors in real-world scenarios, and based on the discovery, we propose a Realistic-like Robust Black-box Adversarial attack (R$^2$BA) with post-processing fusion optimization. Unlike typical perturbations, R$^2$BA uses real-world post-processing, i.e., Gaussian blur, JPEG compression, Gaussian noise and light spot to generate adversarial examples. Specifically, we use a stochastic particle swarm algorithm with inertia decay to optimize post-processing fusion intensity and explore the detector's decision boundary. Guided by the detector's fake probability, R$^2$BA enhances/weakens the detector-vulnerable/detector-robust post-processing intensity to strike a balance between adversariality and invisibility. Extensive experiments on popular/commercial AIGC detectors and datasets demonstrate that R$^2$BA exhibits impressive anti-detection performance, excellent invisibility, and strong robustness in GAN-based and diffusion-based cases. Compared to state-of-the-art white-box and black-box attacks, R$^2$BA shows significant improvements of 15% and 21% in anti-detection performance under the original and robust scenario respectively, offering valuable insights for the security of AIGC detection in real-world applications.

翻译：基于GAN和扩散模型的AI生成内容（AIGC）检测的安全性，与多媒体内容的可信度密切相关。恶意的对抗攻击能够规避这些发展中的AIGC检测器。然而，现有的大多数对抗攻击仅关注GAN生成的人脸图像检测，难以对多类别自然图像和基于扩散模型的检测器生效，且隐蔽性较差。为填补这一空白，我们首先深入分析了AIGC检测器的脆弱性，发现检测器对不同后处理的脆弱性存在差异。接着，考虑到现实场景中检测器的不确定性，并基于该发现，我们提出了一种结合后处理融合优化的拟真鲁棒黑盒对抗攻击方法（R$^2$BA）。与典型的扰动不同，R$^2$BA使用现实世界的后处理（即高斯模糊、JPEG压缩、高斯噪声和光斑）来生成对抗样本。具体而言，我们采用带惯性衰减的随机粒子群算法来优化后处理融合强度，并探索检测器的决策边界。在检测器伪造概率的引导下，R$^2$BA增强/削弱检测器脆弱/检测器鲁棒的后处理强度，以在对抗性和隐蔽性之间取得平衡。在主流/商业AIGC检测器和数据集上进行的大量实验表明，R$^2$BA在基于GAN和基于扩散模型的情况下，均展现出令人印象深刻的抗检测性能、优异的隐蔽性和强大的鲁棒性。与最先进的白盒和黑盒攻击相比，R$^2$BA在原始场景和鲁棒场景下的抗检测性能分别显著提升了15%和21%，为现实应用中AIGC检测的安全性提供了有价值的见解。