The deepfake threats to society and cybersecurity have provoked significant public apprehension, driving intensified efforts within the realm of deepfake video detection. Current video-level methods are mostly based on {3D CNNs} resulting in high computational demands, although have achieved good performance. This paper introduces an elegantly simple yet effective strategy named Thumbnail Layout (TALL), which transforms a video clip into a pre-defined layout to realize the preservation of spatial and temporal dependencies. This transformation process involves sequentially masking frames at the same positions within each frame. These frames are then resized into sub-frames and reorganized into the predetermined layout, forming thumbnails. TALL is model-agnostic and has remarkable simplicity, necessitating only minimal code modifications. Furthermore, we introduce a graph reasoning block (GRB) and semantic consistency (SC) loss to strengthen TALL, culminating in TALL++. GRB enhances interactions between different semantic regions to capture semantic-level inconsistency clues. The semantic consistency loss imposes consistency constraints on semantic features to improve model generalization ability. Extensive experiments on intra-dataset, cross-dataset, diffusion-generated image detection, and deepfake generation method recognition show that TALL++ achieves results surpassing or comparable to the state-of-the-art methods, demonstrating the effectiveness of our approaches for various deepfake detection problems. The code is available at https://github.com/rainy-xu/TALL4Deepfake.
翻译:深度伪造对社会及网络安全构成的威胁已引发公众广泛担忧,促使深度伪造视频检测领域的研究加速推进。现有基于视频级的方法主要依赖3D CNN架构,虽然取得了良好性能,但计算成本高昂。本文提出一种简洁高效的策略——缩略图布局(TALL),通过将视频片段转换为预定义布局来保持时空依赖性。该转换过程对每帧相同位置进行时序掩码操作,随后将各帧缩放为子帧并重组为预定布局,形成缩略图。TALL具有模型无关性和高度简洁性,仅需极少量代码修改即可实现。在此基础上,我们引入图推理模块(GRB)与语义一致性损失函数(SC),构建增强版TALL++。GRB通过增强不同语义区域间的交互,捕获语义级不一致性线索;语义一致性损失则对语义特征施加一致性约束,提升模型泛化能力。在数据集内测试、跨数据集测试、扩散生成图像检测及深度伪造生成方法识别等广泛实验中,TALL++取得超越或媲美现有最优方法的表现,验证了本方法在多种深度伪造检测问题中的有效性。相关代码已开源至https://github.com/rainy-xu/TALL4Deepfake。