Synthetically generated face images have shown to be indistinguishable from real images by humans and as such can lead to a lack of trust in digital content as they can, for instance, be used to spread misinformation. Therefore, the need to develop algorithms for detecting entirely synthetic face images is apparent. Of interest are images generated by state-of-the-art deep learning-based models, as these exhibit a high level of visual realism. Recent works have demonstrated that detecting such synthetic face images under realistic circumstances remains difficult as new and improved generative models are proposed with rapid speed and arbitrary image post-processing can be applied. In this work, we propose a multi-channel architecture for detecting entirely synthetic face images which analyses information both in the frequency and visible spectra using Cross Modal Focal Loss. We compare the proposed architecture with several related architectures trained using Binary Cross Entropy and show in cross-model experiments that the proposed architecture supervised using Cross Modal Focal Loss, in general, achieves most competitive performance.
翻译:合成生成的人脸图像已被证明在人类视觉上难以与真实图像区分,因此可能导致对数字内容信任度的下降,例如它们可能被用于传播虚假信息。因此,开发检测完全合成人脸图像的算法需求显而易见。尤其值得关注的是由基于深度学习的最新模型生成的图像,因为这些图像展现出高度的视觉真实感。近期研究表明,在现实条件下检测此类合成人脸图像仍然具有挑战性,因为新的改进生成模型正被快速提出,且可能对图像施加任意的后处理。在本研究中,我们提出了一种多通道架构用于检测完全合成的人脸图像,该架构利用跨模态焦点损失同时分析频域和可见光谱域的信息。我们将所提出的架构与使用二元交叉熵训练的几种相关架构进行比较,并在跨模型实验中证明,采用跨模态焦点损失监督的所提架构总体上实现了最具竞争力的性能。