The recent wave of AI-generated content (AIGC) has witnessed substantial success in computer vision, with the diffusion model playing a crucial role in this achievement. Due to their impressive generative capabilities, diffusion models are gradually superseding methods based on GANs and auto-regressive Transformers, demonstrating exceptional performance not only in image generation and editing, but also in the realm of video-related research. However, existing surveys mainly focus on diffusion models in the context of image generation, with few up-to-date reviews on their application in the video domain. To address this gap, this paper presents a comprehensive review of video diffusion models in the AIGC era. Specifically, we begin with a concise introduction to the fundamentals and evolution of diffusion models. Subsequently, we present an overview of research on diffusion models in the video domain, categorizing the work into three key areas: video generation, video editing, and other video understanding tasks. We conduct a thorough review of the literature in these three key areas, including further categorization and practical contributions in the field. Finally, we discuss the challenges faced by research in this domain and outline potential future developmental trends. A comprehensive list of video diffusion models studied in this survey is available at https://github.com/ChenHsing/Awesome-Video-Diffusion-Models.
翻译:近年来,人工智能生成内容(AIGC)浪潮在计算机视觉领域取得了显著成功,其中扩散模型发挥着关键作用。凭借其出色的生成能力,扩散模型正逐步取代基于生成对抗网络(GAN)和自回归Transformer的方法,不仅在图像生成与编辑中表现卓越,在视频相关研究领域同样展现出非凡性能。然而,现有综述主要聚焦于图像生成中的扩散模型,鲜有关于其在视频领域应用的最新综述。为弥补这一空白,本文对AIGC时代的视频扩散模型进行了全面综述。具体而言,我们首先简要介绍扩散模型的基本原理与演变历程;随后,系统梳理视频领域扩散模型的研究,将相关工作划分为三大核心方向:视频生成、视频编辑及其他视频理解任务。针对这三个方向,我们开展详尽的文献分析,包括进一步分类及对领域实际贡献的探讨。最后,我们讨论该领域研究面临的挑战,并展望未来潜在的发展趋势。本综述研究的视频扩散模型完整列表可参见https://github.com/ChenHsing/Awesome-Video-Diffusion-Models。