A Survey on Video Diffusion Models

The recent wave of AI-generated content (AIGC) has witnessed substantial success in computer vision, with the diffusion model playing a crucial role in this achievement. Due to their impressive generative capabilities, diffusion models are gradually superseding methods based on GANs and auto-regressive Transformers, demonstrating exceptional performance not only in image generation and editing, but also in the realm of video-related research. However, existing surveys mainly focus on diffusion models in the context of image generation, with few up-to-date reviews on their application in the video domain. To address this gap, this paper presents a comprehensive review of video diffusion models in the AIGC era. Specifically, we begin with a concise introduction to the fundamentals and evolution of diffusion models. Subsequently, we present an overview of research on diffusion models in the video domain, categorizing the work into three key areas: video generation, video editing, and other video understanding tasks. We conduct a thorough review of the literature in these three key areas, including further categorization and practical contributions in the field. Finally, we discuss the challenges faced by research in this domain and outline potential future developmental trends. A comprehensive list of video diffusion models studied in this survey is available at https://github.com/ChenHsing/Awesome-Video-Diffusion-Models.

翻译：近年来，人工智能生成内容（AIGC）浪潮在计算机视觉领域取得了显著成功，其中扩散模型发挥着关键作用。凭借其出色的生成能力，扩散模型正逐步取代基于生成对抗网络（GAN）和自回归Transformer的方法，不仅在图像生成与编辑中表现卓越，在视频相关研究领域同样展现出非凡性能。然而，现有综述主要聚焦于图像生成中的扩散模型，鲜有关于其在视频领域应用的最新综述。为弥补这一空白，本文对AIGC时代的视频扩散模型进行了全面综述。具体而言，我们首先简要介绍扩散模型的基本原理与演变历程；随后，系统梳理视频领域扩散模型的研究，将相关工作划分为三大核心方向：视频生成、视频编辑及其他视频理解任务。针对这三个方向，我们开展详尽的文献分析，包括进一步分类及对领域实际贡献的探讨。最后，我们讨论该领域研究面临的挑战，并展望未来潜在的发展趋势。本综述研究的视频扩散模型完整列表可参见https://github.com/ChenHsing/Awesome-Video-Diffusion-Models。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日