With the rapid advancement in video generation, people can conveniently utilize video generation models to create videos tailored to their specific desires. Nevertheless, there are also growing concerns about their potential misuse in creating and disseminating false information. In this work, we introduce VGMShield: a set of three straightforward but pioneering mitigations through the lifecycle of fake video generation. We start from \textit{fake video detection} trying to understand whether there is uniqueness in generated videos and whether we can differentiate them from real videos; then, we investigate the \textit{tracing} problem, which maps a fake video back to a model that generates it. Towards these, we propose to leverage pre-trained models that focus on {\it spatial-temporal dynamics} as the backbone to identify inconsistencies in videos. Through experiments on seven state-of-the-art open-source models, we demonstrate that current models still cannot perfectly handle spatial-temporal relationships, and thus, we can accomplish detection and tracing with nearly perfect accuracy. Furthermore, anticipating future generative model improvements, we propose a {\it prevention} method that adds invisible perturbations to images to make the generated videos look unreal. Together with fake video detection and tracing, our multi-faceted set of solutions can effectively mitigate misuse of video generative models.
翻译:随着视频生成技术的快速发展,人们可以方便地利用视频生成模型定制符合特定需求的视频。然而,这些技术在被用于制造和传播虚假信息方面的潜在滥用也引发日益增长的担忧。本研究提出VGMShield:一套贯穿虚假视频生成生命周期的三个直接且开创性的缓解措施。我们从《虚假视频检测》入手,试图理解生成视频是否存在独特性,以及能否将其与真实视频区分;随后,我们研究《追溯》问题,即通过映射将虚假视频回溯到生成它的模型。为此,我们提出利用专注于《时空动态》的预训练模型作为主干网络来识别视频中的不一致性。通过在七个最先进的开源模型上进行实验,我们证明当前模型仍无法完美处理时空关系,因此我们能够以近乎完美的精度完成检测与追溯。此外,为应对未来生成模型的改进,我们提出一种《预防》方法,通过向图像添加不可见扰动,使生成的视频看起来不真实。结合虚假视频检测与追溯,我们这套多维度解决方案能够有效缓解视频生成模型的滥用问题。