With the rapid development of deep generative models (such as Generative Adversarial Networks and Auto-encoders), AI-synthesized images of the human face are now of such high quality that humans can hardly distinguish them from pristine ones. Although existing detection methods have shown high performance in specific evaluation settings, e.g., on images from seen models or on images without real-world post-processings, they tend to suffer serious performance degradation in real-world scenarios where testing images can be generated by more powerful generation models or combined with various post-processing operations. To address this issue, we propose a Global and Local Feature Fusion (GLFF) to learn rich and discriminative representations by combining multi-scale global features from the whole image with refined local features from informative patches for face forgery detection. GLFF fuses information from two branches: the global branch to extract multi-scale semantic features and the local branch to select informative patches for detailed local artifacts extraction. Due to the lack of a face forgery dataset simulating real-world applications for evaluation, we further create a challenging face forgery dataset, named DeepFakeFaceForensics (DF^3), which contains 6 state-of-the-art generation models and a variety of post-processing techniques to approach the real-world scenarios. Experimental results demonstrate the superiority of our method to the state-of-the-art methods on the proposed DF^3 dataset and three other open-source datasets.
翻译:随着深度生成模型(如生成对抗网络和自编码器)的快速发展,AI合成人脸图像的质量已达到使人难以将其与原始图像区分的程度。尽管现有检测方法在特定评估设置(例如针对已知模型生成的图像或未经现实后处理的图像)中表现出高性能,但在测试图像可能由更强大的生成模型生成或与多种后处理操作结合的现实场景中,其性能往往会严重下降。为解决该问题,我们提出全局与局部特征融合(GLFF),通过将来自整幅图像的多尺度全局特征与来自信息性补丁的精细局部特征相结合,学习丰富且具有判别力的表示,用于人脸伪造检测。GLFF融合两个分支的信息:全局分支提取多尺度语义特征,局部分支选择信息性补丁以提取细节局部伪影。由于缺乏模拟现实应用场景的人脸伪造数据集用于评估,我们进一步构建了一个具有挑战性的人脸伪造数据集DeepFakeFaceForensics(DF^3),其中包含6种最先进的生成模型及多种后处理技术以贴近现实场景。实验结果表明,在提出的DF^3数据集及其他三个开源数据集上,我们的方法相较于现有最先进方法具有优越性。