Sequential DeepFake detection is an emerging task that predicts the manipulation sequence in order. Existing methods typically formulate it as an image-to-sequence problem, employing conventional Transformer architectures. However, these methods lack dedicated design and consequently result in limited performance. As such, this paper describes a new Transformer design, called TSOM, by exploring three perspectives: Texture, Shape, and Order of Manipulations. Our method features four major improvements: \ding{182} we describe a new texture-aware branch that effectively captures subtle manipulation traces with a Diversiform Pixel Difference Attention module. \ding{183} Then we introduce a Multi-source Cross-attention module to seek deep correlations among spatial and sequential features, enabling effective modeling of complex manipulation traces. \ding{184} To further enhance the cross-attention, we describe a Shape-guided Gaussian mapping strategy, providing initial priors of the manipulation shape. \ding{185} Finally, observing that the subsequent manipulation in a sequence may influence traces left in the preceding one, we intriguingly invert the prediction order from forward to backward, leading to notable gains as expected. Extensive experimental results demonstrate that our method outperforms others by a large margin, highlighting the superiority of our method.
翻译:序列深度伪造检测是一项新兴任务,旨在按顺序预测篡改序列。现有方法通常将其建模为图像到序列的问题,并采用传统的Transformer架构。然而,这些方法缺乏针对性设计,导致性能受限。为此,本文通过探索纹理、形状与篡改顺序三个维度,提出了一种名为TSOM的新型Transformer设计。我们的方法包含四项主要改进:\ding{182} 我们提出了一种新的纹理感知分支,通过多样化像素差分注意力模块有效捕捉细微的篡改痕迹。\ding{183} 随后,我们引入了多源交叉注意力模块,以探寻空间特征与序列特征之间的深层关联,从而有效建模复杂的篡改痕迹。\ding{184} 为进一步增强交叉注意力,我们提出了一种形状引导的高斯映射策略,为篡改形状提供初始先验。\ding{185} 最后,通过观察发现序列中后续篡改可能影响先前操作遗留的痕迹,我们创新性地将预测顺序从正向转为反向,如预期般带来了显著性能提升。大量实验结果表明,我们的方法大幅优于现有技术,充分体现了其优越性。