Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach

Diffusion models have revolutionized image generation, and their extension to video generation has shown promise. However, current video diffusion models~(VDMs) rely on a scalar timestep variable applied at the clip level, which limits their ability to model complex temporal dependencies needed for various tasks like image-to-video generation. To address this limitation, we propose a frame-aware video diffusion model~(FVDM), which introduces a novel vectorized timestep variable~(VTV). Unlike conventional VDMs, our approach allows each frame to follow an independent noise schedule, enhancing the model's capacity to capture fine-grained temporal dependencies. FVDM's flexibility is demonstrated across multiple tasks, including standard video generation, image-to-video generation, video interpolation, and long video synthesis. Through a diverse set of VTV configurations, we achieve superior quality in generated videos, overcoming challenges such as catastrophic forgetting during fine-tuning and limited generalizability in zero-shot methods.Our empirical evaluations show that FVDM outperforms state-of-the-art methods in video generation quality, while also excelling in extended tasks. By addressing fundamental shortcomings in existing VDMs, FVDM sets a new paradigm in video synthesis, offering a robust framework with significant implications for generative modeling and multimedia applications.

翻译：扩散模型已彻底改变了图像生成领域，其向视频生成的扩展也展现出巨大潜力。然而，当前视频扩散模型（VDMs）依赖于在片段层级应用的标量时间步变量，这限制了其建模复杂时序依赖关系的能力，而这种能力对于图像到视频生成等多种任务至关重要。为突破这一局限，我们提出了一种帧感知视频扩散模型（FVDM），其核心是引入一种新颖的向量化时间步变量（VTV）。与传统的VDMs不同，我们的方法允许每一帧遵循独立的噪声调度，从而增强了模型捕捉细粒度时序依赖的能力。FVDM的灵活性在多项任务中得到验证，包括标准视频生成、图像到视频生成、视频插值以及长视频合成。通过多样化的VTV配置，我们在生成视频质量上实现了显著提升，克服了微调过程中的灾难性遗忘以及零样本方法泛化能力有限等挑战。实证评估表明，FVDM在视频生成质量上超越了现有最先进方法，同时在扩展任务中也表现出色。通过解决现有VDMs的根本性缺陷，FVDM为视频合成确立了新范式，提供了一个稳健的框架，对生成建模和多媒体应用具有深远意义。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日