What Matters in Detecting AI-Generated Videos like Sora?

Recent advancements in diffusion-based video generation have showcased remarkable results, yet the gap between synthetic and real-world videos remains under-explored. In this study, we examine this gap from three fundamental perspectives: appearance, motion, and geometry, comparing real-world videos with those generated by a state-of-the-art AI model, Stable Video Diffusion. To achieve this, we train three classifiers using 3D convolutional networks, each targeting distinct aspects: vision foundation model features for appearance, optical flow for motion, and monocular depth for geometry. Each classifier exhibits strong performance in fake video detection, both qualitatively and quantitatively. This indicates that AI-generated videos are still easily detectable, and a significant gap between real and fake videos persists. Furthermore, utilizing the Grad-CAM, we pinpoint systematic failures of AI-generated videos in appearance, motion, and geometry. Finally, we propose an Ensemble-of-Experts model that integrates appearance, optical flow, and depth information for fake video detection, resulting in enhanced robustness and generalization ability. Our model is capable of detecting videos generated by Sora with high accuracy, even without exposure to any Sora videos during training. This suggests that the gap between real and fake videos can be generalized across various video generative models. Project page: https://justin-crchang.github.io/3DCNNDetection.github.io/

翻译：基于扩散的视频生成技术近期取得了显著进展，然而合成视频与真实视频之间的差异仍未得到充分探索。本研究从三个基本视角——外观、运动和几何——考察了这一差异，将真实视频与最先进的AI模型Stable Video Diffusion生成的视频进行比较。为此，我们使用三维卷积网络训练了三个分类器，分别针对不同方面：基于视觉基础模型特征的外观分类器、基于光流的运动分类器以及基于单目深度估计的几何分类器。每个分类器在伪造视频检测方面均表现出色，定性与定量评估结果一致。这表明AI生成的视频仍易于被检测，真实视频与伪造视频之间仍存在显著差距。进一步地，通过Grad-CAM方法，我们精准定位了AI生成视频在外观、运动和几何方面的系统性缺陷。最后，我们提出了一种专家集成模型，该模型整合外观、光流和深度信息进行伪造视频检测，从而提升了模型的鲁棒性与泛化能力。我们的模型能够以高精度检测Sora生成的视频，即使在训练过程中未接触任何Sora视频样本。这表明真实视频与伪造视频之间的差异在不同视频生成模型中具有普适性。项目页面：https://justin-crchang.github.io/3DCNNDetection.github.io/

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日