The recent renaissance in generative models, driven primarily by the advent of diffusion models and iterative improvement in GAN methods, has enabled many creative applications. However, each advancement is also accompanied by a rise in the potential for misuse. In the arena of deepfake generation this is a key societal issue. In particular, the ability to modify segments of videos using such generative techniques creates a new paradigm of deepfakes which are mostly real videos altered slightly to distort the truth. Current deepfake detection methods in the academic literature are not evaluated on this paradigm. In this paper, we present a deepfake detection method able to address this issue by performing both frame and video level deepfake prediction. To facilitate testing our method we create a new benchmark dataset where videos have both real and fake frame sequences. Our method utilizes the Vision Transformer, Scaling and Shifting pretraining and Timeseries Transformer to temporally segment videos to help facilitate the interpretation of possible deepfakes. Extensive experiments on a variety of deepfake generation methods show excellent results on temporal segmentation and classical video level predictions as well. In particular, the paradigm we introduce will form a powerful tool for the moderation of deepfakes, where human oversight can be better targeted to the parts of videos suspected of being deepfakes. All experiments can be reproduced at: https://github.com/sanjaysaha1311/temporal-deepfake-segmentation.
翻译:近年来,以扩散模型的出现和生成对抗网络方法的迭代改进为主要驱动力的生成模型复兴,催生了众多创意应用。然而,每项进步也伴随着被滥用的潜在风险上升。在深度伪造生成领域,这已成为一个关键的社会问题。特别是,利用此类生成技术修改视频片段的能力创造了一种新的深度伪造范式:这些视频大部分为真实内容,仅经过轻微改动以扭曲事实。当前学术文献中的深度伪造检测方法尚未针对这一范式进行评估。本文提出了一种深度伪造检测方法,通过同时进行帧级和视频级深度伪造预测来解决这一问题。为便于测试我们的方法,我们创建了一个新的基准数据集,其中视频包含真实和虚假帧序列。我们的方法利用了视觉Transformer、缩放与移位预训练以及时间序列Transformer对视频进行时间分割,以帮助解释可能的深度伪造。针对多种深度伪造生成方法的大量实验表明,该方法在时间分割和经典视频级预测上均取得了出色效果。特别地,我们引入的范式将成为深度伪造审核的有力工具,使人工监督能够更精准地聚焦于被怀疑有深度伪造的视频部分。所有实验均可通过以下链接复现:https://github.com/sanjaysaha1311/temporal-deepfake-segmentation。