With the rapid proliferation of powerful image generators, accurate detection of AI-generated images has become essential for maintaining a trustworthy online environment. However, existing deepfake detectors often generalize poorly to images produced by unseen generators. Notably, despite being trained under vastly different paradigms, such as diffusion or autoregressive modeling, many modern image generators share common final architectural components that serve as the last stage for converting intermediate representations into images. Motivated by this insight, we propose to "contaminate" real images using the generator's final component and train a detector to distinguish them from the original real images. We further introduce a taxonomy based on generators' final components and categorize 21 widely used generators accordingly, enabling a comprehensive investigation of our method's generalization capability. Using only 100 samples from each of three representative categories, our detector-fine-tuned on the DINOv3 backbone-achieves an average accuracy of 98.83% across 22 testing sets from unseen generators.
翻译:随着强大图像生成器的迅速扩散,准确检测AI生成图像已成为维持可信在线环境的关键。然而,现有的深度伪造检测器在面对未见过的生成器所生成的图像时,往往表现出较差的泛化能力。值得注意的是,尽管许多现代图像生成器(如基于扩散或自回归建模的生成器)在训练范式上存在巨大差异,但它们通常共享相同的最终架构组件,这些组件作为将中间表示转换为图像的最后阶段。受此启发,我们提出利用生成器的最终组件对真实图像进行“污染”,并训练一个检测器来区分这些被污染的图像与原始真实图像。我们进一步基于生成器的最终组件引入了一种分类法,并据此对21个广泛使用的生成器进行了分类,从而能够全面研究我们方法的泛化能力。仅使用三个代表性类别中各100个样本,基于DINOv3主干网络微调的检测器在来自未见生成器的22个测试集上达到了平均98.83%的准确率。