As generative Artificial Intelligence (AI) advances, the realism of AI generated imagery has reached a threshold capable of deceiving even vigilant human observers. Yet, while current AI-generated Image Detection (AID) approaches perform exceptionally well on controlled benchmark datasets, they struggle significantly with real-world cases. To study this behavior we introduce the ITW-SM dataset, a curated collection of real and AI-generated images originating from major social media platforms. We employ it to analyze the effects of key design choices typically considered when building a detector, involving its architecture, pre-trained latent spaces, training data as well as pre-processing approaches. We indicate that naively scaling the pre-training stage or opting for more training data does not always lead to better detection performance. Instead, our work reveals that it is crucial to optimize each design choice to enable the processing pipeline to propagate and effectively analyze both low-level traces as well as high-level image semantics. Building on our findings, we achieve a substantial average improvement of 26.87% in AUC across multiple state-of-the-art detection approaches and under real-world conditions, providing a roadmap for developing more resilient detectors. Our assets are available on https://mever-team.github.io/itw-sm.
翻译:随着生成式人工智能技术的进步,AI生成图像的真实感已达到足以欺骗警惕的人类观察者的阈值。然而,尽管当前AI生成图像检测方法在受控基准数据集上表现优异,但在真实场景中却面临显著困难。为研究这一现象,我们提出了ITW-SM数据集——一个源自主要社交媒体平台的真实与AI生成图像的精选集合。我们利用该数据集分析了构建检测器时典型需考量的关键设计选择的影响,包括其架构、预训练隐空间、训练数据及预处理方法。研究表明:单纯扩大预训练规模或增加训练数据量并不总能提升检测性能。相反,我们的工作揭示,优化每个设计选择至关重要,这能促使处理流水线有效传播并分析低层痕迹与高层图像语义。基于研究发现,我们在多种先进检测方法及真实场景条件下,实现了AUC平均提升26.87%的显著进步,为开发更具鲁棒性的检测器提供了路线图。相关资源已开源发布于https://mever-team.github.io/itw-sm。