The emergence of text-to-image generative models has revolutionized the field of deepfakes, enabling the creation of realistic and convincing visual content directly from textual descriptions. However, this advancement presents considerably greater challenges in detecting the authenticity of such content. Existing deepfake detection datasets and methods often fall short in effectively capturing the extensive range of emerging deepfakes and offering satisfactory explanatory information for detection. To address the significant issue, this paper introduces a deepfake database (DFLIP-3K) for the development of convincing and explainable deepfake detection. It encompasses about 300K diverse deepfake samples from approximately 3K generative models, which boasts the largest number of deepfake models in the literature. Moreover, it collects around 190K linguistic footprints of these deepfakes. The two distinguished features enable DFLIP-3K to develop a benchmark that promotes progress in linguistic profiling of deepfakes, which includes three sub-tasks namely deepfake detection, model identification, and prompt prediction. The deepfake model and prompt are two essential components of each deepfake, and thus dissecting them linguistically allows for an invaluable exploration of trustworthy and interpretable evidence in deepfake detection, which we believe is the key for the next-generation deepfake detection. Furthermore, DFLIP-3K is envisioned as an open database that fosters transparency and encourages collaborative efforts to further enhance its growth. Our extensive experiments on the developed benchmark verify that our DFLIP-3K database is capable of serving as a standardized resource for evaluating and comparing linguistic-based deepfake detection, identification, and prompt prediction techniques.
翻译:文本到图像生成模型的出现彻底改变了深度伪造领域,使得能够直接从文本描述中生成逼真且令人信服的视觉内容。然而,这一进展给检测此类内容的真实性带来了显著更大的挑战。现有的深度伪造检测数据集和方法通常无法有效捕捉涌现出的广泛深度伪造类型,也无法为检测提供令人满意的解释性信息。为解决这一重要问题,本文引入了一个用于开发可信且可解释的深度伪造检测数据集DFLIP-3K。该数据集包含来自约3000种生成模型的约30万个多样化深度伪造样本,是文献中涵盖深度伪造模型数量最多的数据集。此外,它还收集了约19万个此类深度伪造的语言特征。这两个显著特性使DFLIP-3K能够开发一个基准,促进深度伪造语言特征分析的进步,该基准包括三个子任务:深度伪造检测、模型识别和提示预测。深度伪造模型和提示是每个深度伪造的两个基本组成部分,因此从语言学角度对其进行解析,能够为深度伪造检测中可信且可解释的证据提供宝贵探索——我们认为这是下一代深度伪造检测的关键。此外,DFLIP-3K被设计为一个开放数据库,旨在促进透明度并鼓励协作努力以进一步增强其发展。我们在所开发基准上进行的大量实验证明,DFLIP-3K数据库能够作为评估和比较基于语言的深度伪造检测、识别和提示预测技术的标准化资源。