Videos depict the change of complex dynamical systems over time in the form of discrete image sequences. Generating controllable videos by learning the dynamical system is an important yet underexplored topic in the computer vision community. This paper presents a novel framework, TiV-ODE, to generate highly controllable videos from a static image and a text caption. Specifically, our framework leverages the ability of Neural Ordinary Differential Equations~(Neural ODEs) to represent complex dynamical systems as a set of nonlinear ordinary differential equations. The resulting framework is capable of generating videos with both desired dynamics and content. Experiments demonstrate the ability of the proposed method in generating highly controllable and visually consistent videos, and its capability of modeling dynamical systems. Overall, this work is a significant step towards developing advanced controllable video generation models that can handle complex and dynamic scenes.
翻译:视频以离散图像序列的形式呈现复杂动力系统随时间的变化。通过学习动力系统生成可控视频是计算机视觉领域一个重要但尚未充分探索的课题。本文提出了一种新型框架TiV-ODE,能够从静态图像和文本描述中生成高度可控的视频。具体而言,该框架利用神经常微分方程(Neural ODEs)将复杂动力系统表征为一组非线性常微分方程。所提出的框架能够生成兼具所需动态特性与内容的视频。实验证明了该方法在生成高度可控且视觉一致性视频方面的能力,以及其建模动力系统的有效性。总体而言,这项工作为开发能够处理复杂动态场景的先进可控视频生成模型迈出了重要一步。