The rapid advances in generative AI models have empowered the creation of highly realistic images with arbitrary content, raising concerns about potential misuse and harm, such as Deepfakes. Current research focuses on training detectors using large datasets of generated images. However, these training-based solutions are often computationally expensive and show limited generalization to unseen generated images. In this paper, we propose a training-free method to distinguish between real and AI-generated images. We first observe that real images are more robust to tiny noise perturbations than AI-generated images in the representation space of vision foundation models. Based on this observation, we propose RIGID, a training-free and model-agnostic method for robust AI-generated image detection. RIGID is a simple yet effective approach that identifies whether an image is AI-generated by comparing the representation similarity between the original and the noise-perturbed counterpart. Our evaluation on a diverse set of AI-generated images and benchmarks shows that RIGID significantly outperforms existing trainingbased and training-free detectors. In particular, the average performance of RIGID exceeds the current best training-free method by more than 25%. Importantly, RIGID exhibits strong generalization across different image generation methods and robustness to image corruptions.
翻译:生成式AI模型的快速发展使得能够创建具有任意内容的高度逼真图像,这引发了人们对潜在滥用和危害(如深度伪造)的担忧。当前的研究重点在于利用生成图像的大型数据集训练检测器。然而,这些基于训练的解决方案通常计算成本高昂,并且对未见过的生成图像泛化能力有限。在本文中,我们提出了一种无需训练的方法来区分真实图像与AI生成图像。我们首先观察到,在视觉基础模型的表示空间中,真实图像比AI生成图像对微小噪声扰动具有更强的鲁棒性。基于这一观察,我们提出了RIGID,一种无需训练且模型无关的鲁棒AI生成图像检测方法。RIGID是一种简单而有效的方法,它通过比较原始图像与其噪声扰动对应物之间的表示相似性,来识别图像是否为AI生成。我们在多样化的AI生成图像和基准测试上的评估表明,RIGID显著优于现有的基于训练和无需训练的检测器。具体而言,RIGID的平均性能超过当前最佳无需训练方法25%以上。重要的是,RIGID在不同图像生成方法间展现出强大的泛化能力以及对图像损坏的鲁棒性。