As DeepFake video manipulation techniques escalate, posing profound threats, the urgent need to develop efficient detection strategies is underscored. However, one particular issue lies with facial images being mis-detected, often originating from degraded videos or adversarial attacks, leading to unexpected temporal artifacts that can undermine the efficacy of DeepFake video detection techniques. This paper introduces a novel method for robust DeepFake video detection, harnessing the power of the proposed Graph-Regularized Attentive Convolutional Entanglement (GRACE) based on the graph convolutional network with graph Laplacian to address the aforementioned challenges. First, conventional Convolution Neural Networks are deployed to perform spatiotemporal features for the entire video. Then, the spatial and temporal features are mutually entangled by constructing a graph with sparse constraint, enforcing essential features of valid face images in the noisy face sequences remaining, thus augmenting stability and performance for DeepFake video detection. Furthermore, the Graph Laplacian prior is proposed in the graph convolutional network to remove the noise pattern in the feature space to further improve the performance. Comprehensive experiments are conducted to illustrate that our proposed method delivers state-of-the-art performance in DeepFake video detection under noisy face sequences. The source code is available at https://github.com/ming053l/GRACE.
翻译:随着DeepFake视频篡改技术日益猖獗并构成严重威胁,开发高效检测策略的需求愈发迫切。然而,当前检测方法面临的一个突出问题在于面部图像的误检,这类误检通常源于视频质量退化或对抗性攻击,导致出现异常的时序伪影,从而削弱DeepFake视频检测技术的有效性。本文提出一种鲁棒的DeepFake视频检测新方法,该方法基于图卷积网络与图拉普拉斯算子,利用所提出的图正则化注意力卷积纠缠(GRACE)框架来解决上述挑战。首先,采用传统卷积神经网络提取视频的时空特征。随后,通过构建具有稀疏约束的图结构将空间特征与时间特征相互纠缠,从而在含噪声的面部序列中保留有效面部图像的关键特征,以此增强DeepFake视频检测的稳定性与性能。此外,我们在图卷积网络中引入图拉普拉斯先验,以消除特征空间中的噪声模式,从而进一步提升检测性能。综合实验表明,所提方法在含噪声面部序列下的DeepFake视频检测任务中达到了最先进的性能水平。源代码公开于https://github.com/ming053l/GRACE。