A Bias-Free Training Paradigm for More General AI-generated Image Detection

Successful forensic detectors can produce excellent results in supervised learning benchmarks but struggle to transfer to real-world applications. We believe this limitation is largely due to inadequate training data quality. While most research focuses on developing new algorithms, less attention is given to training data selection, despite evidence that performance can be strongly impacted by spurious correlations such as content, format, or resolution. A well-designed forensic detector should detect generator specific artifacts rather than reflect data biases. To this end, we propose B-Free, a bias-free training paradigm, where fake images are generated from real ones using the conditioning procedure of stable diffusion models. This ensures semantic alignment between real and fake images, allowing any differences to stem solely from the subtle artifacts introduced by AI generation. Through content-based augmentation, we show significant improvements in both generalization and robustness over state-of-the-art detectors and more calibrated results across 27 different generative models, including recent releases, like FLUX and Stable Diffusion 3.5. Our findings emphasize the importance of a careful dataset design, highlighting the need for further research on this topic. Code and data are publicly available at https://grip-unina.github.io/B-Free/.

翻译：成功的取证检测器在监督学习基准测试中能取得优异结果，但在迁移至实际应用时却面临困难。我们认为这一局限主要源于训练数据质量不足。尽管大多数研究专注于开发新算法，但对训练数据选择的关注相对较少，尽管有证据表明性能可能受到内容、格式或分辨率等虚假相关性的显著影响。一个设计良好的取证检测器应检测生成器特有的伪影，而非反映数据偏差。为此，我们提出B-Free，一种无偏训练范式，其中伪造图像通过稳定扩散模型的条件生成过程从真实图像生成。这确保了真实与伪造图像间的语义对齐，使得任何差异仅源于AI生成引入的细微伪影。通过基于内容的增强，我们在泛化性和鲁棒性方面均展现出相较于最先进检测器的显著提升，并在包括FLUX和Stable Diffusion 3.5等最新发布的27种不同生成模型中获得了更校准的结果。我们的发现强调了精心设计数据集的重要性，凸显了对此主题进行进一步研究的必要性。代码与数据已公开于https://grip-unina.github.io/B-Free/。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/