This paper presents a new approach for the detection of fake videos, based on the analysis of style latent vectors and their abnormal behavior in temporal changes in the generated videos. We discovered that the generated facial videos suffer from the temporal distinctiveness in the temporal changes of style latent vectors, which are inevitable during the generation of temporally stable videos with various facial expressions and geometric transformations. Our framework utilizes the StyleGRU module, trained by contrastive learning, to represent the dynamic properties of style latent vectors. Additionally, we introduce a style attention module that integrates StyleGRU-generated features with content-based features, enabling the detection of visual and temporal artifacts. We demonstrate our approach across various benchmark scenarios in deepfake detection, showing its superiority in cross-dataset and cross-manipulation scenarios. Through further analysis, we also validate the importance of using temporal changes of style latent vectors to improve the generality of deepfake video detection.
翻译:本文提出了一种基于风格潜在向量及其在生成视频中时间变化异常行为分析的新型伪造视频检测方法。我们发现,生成的人脸视频在风格潜在向量的时间变化中普遍存在时间特异性,这种特异性是生成具有多种面部表情与几何变换的时间稳定视频时不可避免的现象。本框架采用经对比学习训练的StyleGRU模块表征风格潜在向量的动态特性,并引入风格注意力模块将StyleGRU生成特征与基于内容的特征相融合,从而实现对视觉伪影与时间伪影的联合检测。我们在深度伪造检测领域的多个基准场景中验证了方法的有效性,展示了其在跨数据集与跨伪造场景下的优越性能。通过进一步分析,我们验证了利用风格潜在向量的时间变化特征对提升深度伪造视频检测泛化能力的关键作用。