Towards Generalizable Deepfake Video Detection with Thumbnail Layout and Graph Reasoning

The deepfake threats to society and cybersecurity have provoked significant public apprehension, driving intensified efforts within the realm of deepfake video detection. Current video-level methods are mostly based on {3D CNNs} resulting in high computational demands, although have achieved good performance. This paper introduces an elegantly simple yet effective strategy named Thumbnail Layout (TALL), which transforms a video clip into a pre-defined layout to realize the preservation of spatial and temporal dependencies. This transformation process involves sequentially masking frames at the same positions within each frame. These frames are then resized into sub-frames and reorganized into the predetermined layout, forming thumbnails. TALL is model-agnostic and has remarkable simplicity, necessitating only minimal code modifications. Furthermore, we introduce a graph reasoning block (GRB) and semantic consistency (SC) loss to strengthen TALL, culminating in TALL++. GRB enhances interactions between different semantic regions to capture semantic-level inconsistency clues. The semantic consistency loss imposes consistency constraints on semantic features to improve model generalization ability. Extensive experiments on intra-dataset, cross-dataset, diffusion-generated image detection, and deepfake generation method recognition show that TALL++ achieves results surpassing or comparable to the state-of-the-art methods, demonstrating the effectiveness of our approaches for various deepfake detection problems. The code is available at https://github.com/rainy-xu/TALL4Deepfake.

翻译：深度伪造对社会和网络安全构成的威胁已引发公众广泛担忧，促使深度伪造视频检测领域的研究力度不断加大。当前基于视频级的方法大多采用三维卷积神经网络（3D CNN），虽取得良好性能，但计算开销较高。本文提出一种简洁而高效的策略——缩略图布局（TALL），通过将视频片段转换为预定义布局，实现空间与时间依赖关系的保留。该转换过程依次对每帧中相同位置进行遮罩处理，随后将这些帧缩放为子帧并重新组织成预定布局，最终形成缩略图。TALL具有与模型无关的特性，且实现极为简单，仅需极少量的代码修改。此外，我们引入图推理模块（GRB）和语义一致性损失（SC Loss）以增强TALL，最终形成TALL++。GRB通过增强不同语义区域间的交互，捕捉语义层面的不一致性线索；语义一致性损失则对语义特征施加一致性约束，以提升模型泛化能力。在数据集内检测、跨数据集检测、扩散生成图像检测及深度伪造生成方法识别等广泛实验表明，TALL++取得了超越或可比肩现有最优方法的结果，验证了该方法在多种深度伪造检测问题中的有效性。代码已开源至https://github.com/rainy-xu/TALL4Deepfake。