Video DeepFakes are fake media created with Deep Learning (DL) that manipulate a person's expression or identity. Most current DeepFake detection methods analyze each frame independently, ignoring inconsistencies and unnatural movements between frames. Some newer methods employ optical flow models to capture this temporal aspect, but they are computationally expensive. In contrast, we propose using the related but often ignored Motion Vectors (MVs) and Information Masks (IMs) from the H.264 video codec, to detect temporal inconsistencies in DeepFakes. Our experiments show that this approach is effective and has minimal computational costs, compared with per-frame RGB-only methods. This could lead to new, real-time temporally-aware DeepFake detection methods for video calls and streaming.
翻译:视频深度伪造是利用深度学习创建的虚假媒体,可操纵人物的表情或身份。当前大多数深度伪造检测方法独立分析每一帧,忽略了帧间的不一致性和非自然运动。部分较新的方法采用光流模型来捕捉其时域特征,但计算成本较高。相比之下,我们提出利用H.264视频编码器中相关联但常被忽视的运动向量(MVs)和信息掩码(IMs),来检测深度伪造中的时域不一致性。实验表明,与仅基于逐帧RGB的方法相比,该方法效果显著且计算成本极低。这有望为视频通话和流媒体场景开发出新型、实时且具备时域感知能力的深度伪造检测方法。