WildFake: A Large-scale Challenging Dataset for AI-Generated Images Detection

The extraordinary ability of generative models enabled the generation of images with such high quality that human beings cannot distinguish Artificial Intelligence (AI) generated images from real-life photographs. The development of generation techniques opened up new opportunities but concurrently introduced potential risks to privacy, authenticity, and security. Therefore, the task of detecting AI-generated imagery is of paramount importance to prevent illegal activities. To assess the generalizability and robustness of AI-generated image detection, we present a large-scale dataset, referred to as WildFake, comprising state-of-the-art generators, diverse object categories, and real-world applications. WildFake dataset has the following advantages: 1) Rich Content with Wild collection: WildFake collects fake images from the open-source community, enriching its diversity with a broad range of image classes and image styles. 2) Hierarchical structure: WildFake contains fake images synthesized by different types of generators from GANs, diffusion models, to other generative models. These key strengths enhance the generalization and robustness of detectors trained on WildFake, thereby demonstrating WildFake's considerable relevance and effectiveness for AI-generated detectors in real-world scenarios. Moreover, our extensive evaluation experiments are tailored to yield profound insights into the capabilities of different levels of generative models, a distinctive advantage afforded by WildFake's unique hierarchical structure.

翻译：生成模型凭借其卓越能力，已能生成质量高到人类难以区分人工智能生成图像与真实照片的图像。生成技术的发展开创了新机遇，但同时也给隐私、真实性和安全性带来了潜在风险。因此，检测AI生成图像的任务对于预防非法活动至关重要。为评估AI生成图像检测的泛化性和鲁棒性，我们提出一个名为WildFake的大规模数据集，该数据集包含最先进的生成器、多样化的物体类别及真实世界应用场景。WildFake数据集具有以下优势：1）丰富的野生采集内容：WildFake从开源社区收集虚假图像，通过广泛的图像类别与风格增强多样性；2）层级结构：WildFake包含由生成对抗网络、扩散模型及其他生成模型等不同类型生成器合成的虚假图像。这些核心优势提升了在WildFake上训练的检测器的泛化能力与鲁棒性，从而证明WildFake在真实场景中对AI生成检测器的重要相关性与有效性。此外，我们定制的广泛评估实验为深入理解不同层级生成模型的能力提供了深刻见解——这正是WildFake独特层级结构赋予的显著优势。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日