As DeepFake video manipulation techniques escalate, posing profound threats, the urgent need to develop efficient detection strategies is underscored. However, one particular issue lies with facial images being mis-detected, often originating from degraded videos or adversarial attacks, leading to unexpected temporal artifacts that can undermine the efficacy of DeepFake video detection techniques. This paper introduces a novel method for robust DeepFake video detection, harnessing the power of the proposed Graph-Regularized Attentive Convolutional Entanglement (GRACE) based on the graph convolutional network with graph Laplacian to address the aforementioned challenges. First, conventional Convolution Neural Networks are deployed to perform spatiotemporal features for the entire video. Then, the spatial and temporal features are mutually entangled by constructing a graph with sparse constraint, enforcing essential features of valid face images in the noisy face sequences remaining, thus augmenting stability and performance for DeepFake video detection. Furthermore, the Graph Laplacian prior is proposed in the graph convolutional network to remove the noise pattern in the feature space to further improve the performance. Comprehensive experiments are conducted to illustrate that our proposed method delivers state-of-the-art performance in DeepFake video detection under noisy face sequences. The source code is available at https://github.com/ming053l/GRACE.
翻译:随着DeepFake视频篡改技术的升级及其带来的严重威胁,开发高效的检测策略显得尤为迫切。然而,一个特定问题在于面部图像常被误检,这通常源于视频质量退化或对抗性攻击,导致出现意外的时序伪影,从而可能削弱DeepFake视频检测技术的有效性。本文提出了一种鲁棒的DeepFake视频检测新方法,利用基于图卷积网络与图拉普拉斯算子所提出的图正则化注意力卷积纠缠(GRACE)框架来解决上述挑战。首先,采用传统的卷积神经网络提取整个视频的时空特征。随后,通过构建具有稀疏约束的图,将空间特征与时间特征相互纠缠,从而在噪声人脸序列中保留有效人脸图像的关键特征,以此增强DeepFake视频检测的稳定性与性能。此外,我们在图卷积网络中引入图拉普拉斯先验,以消除特征空间中的噪声模式,从而进一步提升性能。综合实验表明,所提方法在噪声人脸序列下的DeepFake视频检测中取得了最先进的性能。源代码公开于https://github.com/ming053l/GRACE。