Synthetically generated face images have shown to be indistinguishable from real images by humans and as such can lead to a lack of trust in digital content as they can, for instance, be used to spread misinformation. Therefore, the need to develop algorithms for detecting entirely synthetic face images is apparent. Of interest are images generated by state-of-the-art deep learning-based models, as these exhibit a high level of visual realism. Recent works have demonstrated that detecting such synthetic face images under realistic circumstances remains difficult as new and improved generative models are proposed with rapid speed and arbitrary image post-processing can be applied. In this work, we propose a multi-channel architecture for detecting entirely synthetic face images which analyses information both in the frequency and visible spectra using Cross Modal Focal Loss. We compare the proposed architecture with several related architectures trained using Binary Cross Entropy and show in cross-model experiments that the proposed architecture supervised using Cross Modal Focal Loss, in general, achieves most competitive performance.
翻译:合成生成的人脸图像已被证明能令人类难以与真实图像区分,因此可能导致对数字内容的信任缺失,例如被用于传播虚假信息。由此可见,开发检测完全合成人脸图像的算法十分必要。其中,由基于深度学习的最新模型生成的图像因具有高度视觉逼真度而备受关注。近期研究表明,在现实场景下检测此类合成人脸图像仍具挑战,因为新型改进生成模型更新迭代迅速,且可对图像进行任意后处理。本文提出一种多通道架构,通过利用跨模态焦点损失函数分析频域与可见光谱域信息,实现完全合成人脸图像的检测。我们将所提架构与采用二元交叉熵训练的若干相关架构进行对比,并在跨模型实验中证明,使用跨模态焦点损失监督的所提架构总体上取得了最具竞争力的性能。