The generative model has made significant advancements in the creation of realistic videos, which causes security issues. However, this emerging risk has not been adequately addressed due to the absence of a benchmark dataset for AI-generated videos. In this paper, we first construct a video dataset using advanced diffusion-based video generation algorithms with various semantic contents. Besides, typical video lossy operations over network transmission are adopted to generate degraded samples. Then, by analyzing local and global temporal defects of current AI-generated videos, a novel detection framework by adaptively learning local motion information and global appearance variation is constructed to expose fake videos. Finally, experiments are conducted to evaluate the generalization and robustness of different spatial and temporal domain detection methods, where the results can serve as the baseline and demonstrate the research challenge for future studies.
翻译:生成模型在逼真视频创作方面取得了显著进展,这引发了安全问题。然而,由于缺乏AI生成视频的基准数据集,这一新兴风险尚未得到充分应对。本文首先利用基于扩散的高级视频生成算法,构建了一个包含多种语义内容的视频数据集。同时,针对网络传输中典型的视频有损操作,生成了退化样本。接着,通过分析当前AI生成视频的局部与全局时间缺陷,构建了一种自适应学习局部运动信息与全局外观变化的新型检测框架,以揭露伪造视频。最后,通过实验评估了不同空间与时域检测方法的泛化性与鲁棒性,其结果可作为基线,并彰显了未来研究面临的挑战。