The misuse of deepfake technology by malicious actors poses a potential threat to nations, societies, and individuals. However, existing methods for detecting deepfakes primarily focus on uncompressed videos, such as noise characteristics, local textures, or frequency statistics. When applied to compressed videos, these methods experience a decrease in detection performance and are less suitable for real-world scenarios. In this paper, we propose a deepfake video detection method based on 3D spatiotemporal trajectories. Specifically, we utilize a robust 3D model to construct spatiotemporal motion features, integrating feature details from both 2D and 3D frames to mitigate the influence of large head rotation angles or insufficient lighting within frames. Furthermore, we separate facial expressions from head movements and design a sequential analysis method based on phase space motion trajectories to explore the feature differences between genuine and fake faces in deepfake videos. We conduct extensive experiments to validate the performance of our proposed method on several compressed deepfake benchmarks. The robustness of the well-designed features is verified by calculating the consistent distribution of facial landmarks before and after video compression.Our method yields satisfactory results and showcases its potential for practical applications.
翻译:恶意行为者滥用深度伪造技术对国家安全、社会及个人构成潜在威胁。然而,现有深度伪造检测方法主要聚焦于未压缩视频,例如噪声特征、局部纹理或频率统计。当应用于压缩视频时,这些方法检测性能下降,难以适用于真实场景。本文提出一种基于三维时空轨迹的深度伪造视频检测方法。具体而言,我们采用鲁棒的三维模型构建时空运动特征,融合二维与三维帧的特征细节,以减轻大角度头部旋转或帧内光照不足的影响。此外,我们从头部运动中分离面部表情,并设计基于相空间运动轨迹的序列分析方法,探究深度伪造视频中真伪人脸的特征差异。我们在多个压缩深度伪造基准数据集上开展广泛实验,验证所提方法的性能。通过计算视频压缩前后人脸关键点的一致分布,验证了精心设计特征的鲁棒性。该方法取得令人满意的结果,并展现出实际应用潜力。