Robust Sequential DeepFake Detection

Since photorealistic faces can be readily generated by facial manipulation technologies nowadays, potential malicious abuse of these technologies has drawn great concerns. Numerous deepfake detection methods are thus proposed. However, existing methods only focus on detecting one-step facial manipulation. As the emergence of easy-accessible facial editing applications, people can easily manipulate facial components using multi-step operations in a sequential manner. This new threat requires us to detect a sequence of facial manipulations, which is vital for both detecting deepfake media and recovering original faces afterwards. Motivated by this observation, we emphasize the need and propose a novel research problem called Detecting Sequential DeepFake Manipulation (Seq-DeepFake). Unlike the existing deepfake detection task only demanding a binary label prediction, detecting Seq-DeepFake manipulation requires correctly predicting a sequential vector of facial manipulation operations. To support a large-scale investigation, we construct the first Seq-DeepFake dataset, where face images are manipulated sequentially with corresponding annotations of sequential facial manipulation vectors. Based on this new dataset, we cast detecting Seq-DeepFake manipulation as a specific image-to-sequence task and propose a concise yet effective Seq-DeepFake Transformer (SeqFakeFormer). To better reflect real-world deepfake data distributions, we further apply various perturbations on the original Seq-DeepFake dataset and construct the more challenging Sequential DeepFake dataset with perturbations (Seq-DeepFake-P). To exploit deeper correlation between images and sequences when facing Seq-DeepFake-P, a dedicated Seq-DeepFake Transformer with Image-Sequence Reasoning (SeqFakeFormer++) is devised, which builds stronger correspondence between image-sequence pairs for more robust Seq-DeepFake detection.

翻译：鉴于当前面部操纵技术能够轻易生成逼真的人脸图像，这些技术潜在的恶意滥用已引发广泛关注，因此涌现出大量深度伪造检测方法。然而，现有方法仅聚焦于检测单步面部操纵。随着易于获取的面部编辑应用的出现，人们可通过多步操作以顺序方式操纵面部组件。这一新型威胁要求我们检测序列化的面部操纵行为，这对于检测深度伪造媒体及后续恢复原始人脸至关重要。基于此观察，我们强调该需求的必要性并首次提出名为"检测顺序深度伪造操纵"（Seq-DeepFake）的新研究问题。与仅需二分类标签预测的传统深度伪造检测任务不同，检测Seq-DeepFake操纵需正确预测面部操纵操作的顺序向量。为支撑大规模研究，我们构建了首个Seq-DeepFake数据集，其中人脸图像经过顺序操纵并带有对应的顺序面部操纵向量标注。基于该新数据集，我们将Seq-DeepFake操纵检测转化为特定图像-序列任务，并提出简洁高效的Seq-DeepFake Transformer（SeqFakeFormer）。为更好反映真实深度伪造数据分布，我们进一步对原始Seq-DeepFake数据集施加多种扰动，构建更具挑战性的带扰动顺序深度伪造数据集（Seq-DeepFake-P）。针对Seq-DeepFake-P场景，我们设计了专用带图像-序列推理的Seq-DeepFake Transformer（SeqFakeFormer++），通过建立更强的图像-序列对对应关系实现更鲁棒的Seq-DeepFake检测。