Over the past years, image generation and manipulation have achieved remarkable progress due to the rapid development of generative AI based on deep learning. Recent studies have devoted significant efforts to address the problem of face image manipulation caused by deepfake techniques. However, the problem of detecting purely synthesized face images has been explored to a lesser extent. In particular, the recent popular Diffusion Models (DMs) have shown remarkable success in image synthesis. Existing detectors struggle to generalize between synthesized images created by different generative models. In this work, a comprehensive benchmark including human face images produced by Generative Adversarial Networks (GANs) and a variety of DMs has been established to evaluate both the generalization ability and robustness of state-of-the-art detectors. Then, the forgery traces introduced by different generative models have been analyzed in the frequency domain to draw various insights. The paper further demonstrates that a detector trained with frequency representation can generalize well to other unseen generative models.
翻译:过去几年,得益于基于深度学习的生成式AI的快速发展,图像生成与编辑技术取得了显著进步。近年来的研究为解决深度伪造技术引起的人脸图像编辑问题付出了大量努力。然而,针对纯合成人脸图像的检测问题探索相对较少。特别是近期流行的扩散模型(DMs)在图像合成领域展现出卓越成效。现有检测器难以有效泛化至不同生成模型所合成的图像。本研究建立了包含生成对抗网络(GANs)及多种扩散模型合成的人脸图像的综合基准评估体系,用以检验最先进检测器的泛化能力与鲁棒性。进而通过频域分析不同生成模型引入的伪造痕迹,揭示多种洞见。本文进一步证明,基于频域表示训练的检测器能够良好地泛化至其他未见过的生成模型。