As DeepFake video manipulation techniques escalate, posing profound threats, the urgent need to develop efficient detection strategies is underscored. However, one particular issue lies with facial images being mis-detected, often originating from degraded videos or adversarial attacks, leading to unexpected temporal artifacts that can undermine the efficacy of DeepFake video detection techniques. This paper introduces a novel method for robust DeepFake video detection, harnessing the power of the proposed Graph-Regularized Attentive Convolutional Entanglement (GRACE) based on the graph convolutional network with graph Laplacian to address the aforementioned challenges. First, conventional Convolution Neural Networks are deployed to perform spatiotemporal features for the entire video. Then, the spatial and temporal features are mutually entangled by constructing a graph with sparse constraint, enforcing essential features of valid face images in the noisy face sequences remaining, thus augmenting stability and performance for DeepFake video detection. Furthermore, the Graph Laplacian prior is proposed in the graph convolutional network to remove the noise pattern in the feature space to further improve the performance. Comprehensive experiments are conducted to illustrate that our proposed method delivers state-of-the-art performance in DeepFake video detection under noisy face sequences. The source code is available at https://github.com/ming053l/GRACE.
翻译:随着DeepFake视频篡改技术日益猖獗并构成严重威胁,开发高效检测策略的需求变得尤为迫切。然而,一个突出问题在于人脸图像常因视频质量退化或对抗攻击而被误检,导致出现意外的时序伪影,从而削弱现有DeepFake视频检测技术的有效性。本文提出一种鲁棒的DeepFake视频检测新方法,通过基于图卷积网络与图拉普拉斯算子的新型图正则化注意力卷积纠缠(GRACE)框架应对上述挑战。首先,采用传统卷积神经网络提取视频的时空特征。随后,通过构建具有稀疏约束的图结构将空间与时间特征相互纠缠,从而在含噪声的人脸序列中保留有效人脸图像的关键特征,以此增强DeepFake视频检测的稳定性与性能。此外,我们在图卷积网络中引入图拉普拉斯先验,以消除特征空间中的噪声模式,从而进一步提升检测性能。综合实验表明,所提方法在含噪声人脸序列的DeepFake视频检测任务中取得了最先进的性能。源代码公开于https://github.com/ming053l/GRACE。