MMNet: Multi-Collaboration and Multi-Supervision Network for Sequential Deepfake Detection

Advanced manipulation techniques have provided criminals with opportunities to make social panic or gain illicit profits through the generation of deceptive media, such as forged face images. In response, various deepfake detection methods have been proposed to assess image authenticity. Sequential deepfake detection, which is an extension of deepfake detection, aims to identify forged facial regions with the correct sequence for recovery. Nonetheless, due to the different combinations of spatial and sequential manipulations, forged face images exhibit substantial discrepancies that severely impact detection performance. Additionally, the recovery of forged images requires knowledge of the manipulation model to implement inverse transformations, which is difficult to ascertain as relevant techniques are often concealed by attackers. To address these issues, we propose Multi-Collaboration and Multi-Supervision Network (MMNet) that handles various spatial scales and sequential permutations in forged face images and achieve recovery without requiring knowledge of the corresponding manipulation method. Furthermore, existing evaluation metrics only consider detection accuracy at a single inferring step, without accounting for the matching degree with ground-truth under continuous multiple steps. To overcome this limitation, we propose a novel evaluation metric called Complete Sequence Matching (CSM), which considers the detection accuracy at multiple inferring steps, reflecting the ability to detect integrally forged sequences. Extensive experiments on several typical datasets demonstrate that MMNet achieves state-of-the-art detection performance and independent recovery performance.

翻译：先进的篡改技术为犯罪分子提供了通过伪造面部图像等欺骗性媒体制造社会恐慌或获取非法利益的机会。为此，研究者提出了多种深度伪造检测方法来评估图像真实性。时序深度伪造检测作为深度伪造检测的延伸任务，旨在识别伪造面部区域及其正确修复顺序。然而，由于空间与时序篡改的不同组合方式，伪造面部图像呈现出显著差异性，严重影响检测性能。此外，伪造图像的恢复需要掌握篡改模型的先验知识以实施逆变换，但相关技术常被攻击者隐藏而难以获知。针对上述问题，本文提出多协同多监督网络（MMNet），该网络能处理伪造面部图像中多样的空间尺度与时序排列，并在无需知晓对应篡改方法的情况下实现恢复。此外，现有评估指标仅考虑单步推理的检测精度，未考虑连续多步条件下与真实标注的匹配程度。为突破这一局限，我们提出一种名为完整序列匹配（CSM）的新评估指标，该指标通过考量多步推理的检测精度，反映对完整伪造序列的检测能力。在多个典型数据集上的大量实验表明，MMNet达到了最先进的检测性能及独立的恢复性能。