Fake or JPEG? Revealing Common Biases in Generated Image Detection Datasets

The widespread adoption of generative image models has highlighted the urgent need to detect artificial content, which is a crucial step in combating widespread manipulation and misinformation. Consequently, numerous detectors and associated datasets have emerged. However, many of these datasets inadvertently introduce undesirable biases, thereby impacting the effectiveness and evaluation of detectors. In this paper, we emphasize that many datasets for AI-generated image detection contain biases related to JPEG compression and image size. Using the GenImage dataset, we demonstrate that detectors indeed learn from these undesired factors. Furthermore, we show that removing the named biases substantially increases robustness to JPEG compression and significantly alters the cross-generator performance of evaluated detectors. Specifically, it leads to more than 11 percentage points increase in cross-generator performance for ResNet50 and Swin-T detectors on the GenImage dataset, achieving state-of-the-art results. We provide the dataset and source codes of this paper on the anonymous website: https://www.unbiased-genimage.org

翻译：生成式图像模型的广泛应用凸显了检测人工内容的迫切需求，这是对抗广泛操纵和虚假信息的关键步骤。因此，大量检测器及相关数据集应运而生。然而，这些数据集中的许多无意中引入了不良偏差，从而影响了检测器的有效性和评估。在本文中，我们强调许多用于AI生成图像检测的数据集包含与JPEG压缩和图像尺寸相关的偏差。利用GenImage数据集，我们证明检测器确实会从这些非预期因素中学习。此外，我们表明消除上述偏差能显著增强对JPEG压缩的鲁棒性，并大幅改变评估检测器的跨生成器性能。具体而言，在GenImage数据集上，ResNet50和Swin-T检测器的跨生成器性能提升了超过11个百分点，达到了最新技术水平。我们在匿名网站上提供本文的数据集和源代码：https://www.unbiased-genimage.org

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日