The widespread use of diffusion methods enables the creation of highly realistic images on demand, thereby posing significant risks to the integrity and safety of online information and highlighting the necessity of DeepFake detection. Our analysis of features extracted by traditional image encoders reveals that both low-level and high-level features offer distinct advantages in identifying DeepFake images produced by various diffusion methods. Inspired by this finding, we aim to develop an effective representation that captures both low-level and high-level features to detect diffusion-based DeepFakes. To address the problem, we propose a text modality-oriented feature extraction method, termed TOFE. Specifically, for a given target image, the representation we discovered is a corresponding text embedding that can guide the generation of the target image with a specific text-to-image model. Experiments conducted across ten diffusion types demonstrate the efficacy of our proposed method.
翻译:扩散方法的广泛应用使得能够按需生成高度逼真的图像,从而对在线信息的完整性和安全性构成重大风险,并凸显了深度伪造检测的必要性。我们对传统图像编码器提取的特征进行分析后发现,低级特征和高级特征在识别由各种扩散方法生成的深度伪造图像方面均具有独特优势。受此发现启发,我们旨在开发一种能同时捕获低级和高级特征的有效表示,以检测基于扩散模型的深度伪造。为解决该问题,我们提出了一种面向文本模态的特征提取方法,称为TOFE。具体而言,对于给定的目标图像,我们所发现的表示是一种对应的文本嵌入,该嵌入能够指导特定文生图模型生成目标图像。在十种扩散类型上进行的实验证明了我们提出方法的有效性。