AI-generated videos have achieved near-perfect visual realism (e.g., Sora), urgently necessitating reliable detection mechanisms. However, detecting such videos faces significant challenges in modeling high-dimensional spatiotemporal dynamics and identifying subtle anomalies that violate physical laws. In this paper, we propose the first physics-driven AI-generated video detection paradigm based on probability flow conservation principles. Specifically, we propose a statistic called Normalized Spatiotemporal Gradient (NSG), which quantifies the ratio of spatial probability gradients to temporal density changes, explicitly capturing deviations from natural video dynamics. Leveraging pre-trained diffusion models, we develop an NSG estimator through spatial gradients approximation and motion-aware temporal modeling without complex motion decomposition while preserving physical constraints. Building on this, we propose an NSG-based video detection method (NSG-VD) that computes the Maximum Mean Discrepancy (MMD) between NSG features of the test and real videos as a detection metric. Last, we derive an upper bound of NSG feature distances between real and generated videos, proving that generated videos exhibit amplified discrepancies due to distributional shifts. Extensive experiments confirm that NSG-VD outperforms state-of-the-art baselines by 16.00% in Recall and 10.75% in F1-Score, validating the superior performance of NSG-VD. The source code is available at https://github.com/ZSHsh98/NSG-VD.
翻译:AI生成视频(如Sora)已实现近乎完美的视觉真实性,迫切需要可靠的检测机制。然而,此类视频检测面临高维时空动态建模以及识别违反物理规律的细微异常等重大挑战。本文首次提出基于概率流守恒原理的物理驱动型AI生成视频检测范式。具体而言,我们提出一种称为归一化时空梯度(NSG)的统计量,该统计量量化空间概率梯度与时间密度变化之比,显式捕捉自然视频动态的偏差。利用预训练扩散模型,我们通过空间梯度近似和运动感知时间建模开发NSG估计器,无需复杂的运动分解即可保留物理约束。在此基础上,我们提出基于NSG的视频检测方法(NSG-VD),该方法计算测试视频与真实视频NSG特征之间的最大均值差异(MMD)作为检测指标。最后,我们推导出真实视频与生成视频之间NSG特征距离的上界,证明生成视频因分布偏移而呈现放大偏差。大量实验证实,NSG-VD在召回率和F1分数上分别比现有最优基线方法提升16.00%和10.75%,验证了NSG-VD的卓越性能。源代码请见https://github.com/ZSHsh98/NSG-VD。