Sequential DeepFake detection is an emerging task that aims to predict the manipulation sequence in order. Existing methods typically formulate it as an image-to-sequence problem, employing conventional Transformer architectures for detection. However, these methods lack dedicated design and consequently result in limited performance. In this paper, we propose a novel Texture-aware and Shape-guided Transformer to enhance detection performance. Our method features four major improvements. Firstly, we describe a texture-aware branch that effectively captures subtle manipulation traces with the Diversiform Pixel Difference Attention module. Then we introduce a Bidirectional Interaction Cross-attention module that seeks deep correlations among spatial and sequential features, enabling effective modeling of complex manipulation traces. To further enhance the cross-attention, we describe a Shape-guided Gaussian mapping strategy, providing initial priors of the manipulation shape. Finally, observing that the latter manipulation in a sequence may influence traces left in the earlier one, we intriguingly invert the prediction order from forward to backward, leading to notable gains as expected. Extensive experimental results demonstrate that our method outperforms others by a large margin, highlighting the superiority of our method.
翻译:时序深度伪造检测是一项新兴任务,旨在按顺序预测篡改操作序列。现有方法通常将其建模为图像到序列问题,采用传统Transformer架构进行检测。然而,这些方法缺乏针对性设计,导致性能受限。本文提出了一种新型的纹理感知与形状引导Transformer以增强检测性能。本方法包含四大改进:首先,设计了纹理感知分支,通过多样像素差分注意力模块有效捕捉细微篡改痕迹;其次,引入双向交互交叉注意力模块,探索空间特征与序列特征间的深层关联,实现对复杂篡改痕迹的有效建模;为增强交叉注意力机制,进一步设计了形状引导高斯映射策略,为篡改形状提供初始先验;最后,观察到序列中后续篡改可能影响前序痕迹留存,创新性地将预测顺序由正向反转为逆向,显著提升了预期性能。大量实验结果表明,本方法以大幅优势超越现有方法,充分彰显了其优越性。