Videos depict the change of complex dynamical systems over time in the form of discrete image sequences. Generating controllable videos by learning the dynamical system is an important yet underexplored topic in the computer vision community. This paper presents a novel framework, TiV-ODE, to generate highly controllable videos from a static image and a text caption. Specifically, our framework leverages the ability of Neural Ordinary Differential Equations~(Neural ODEs) to represent complex dynamical systems as a set of nonlinear ordinary differential equations. The resulting framework is capable of generating videos with both desired dynamics and content. Experiments demonstrate the ability of the proposed method in generating highly controllable and visually consistent videos, and its capability of modeling dynamical systems. Overall, this work is a significant step towards developing advanced controllable video generation models that can handle complex and dynamic scenes.
翻译:视频以离散图像序列形式展现复杂动力系统随时间的变化。通过学习动力系统实现可控视频生成是计算机视觉领域一个至关重要但尚未充分探索的研究课题。本文提出一种新颖框架TiV-ODE,能够从单张静态图像和文本描述生成高度可控的视频。具体而言,该框架利用神经常微分方程(Neural ODE)将复杂动力系统表征为一组非线性常微分方程的能力。由此生成的框架能够产生兼具预期动态与内容特征的视频。实验展示了所提方法在生成高度可控且视觉一致性视频方面的能力,及其对动力系统建模的效能。总体而言,本工作向着开发能够处理复杂动态场景的先进可控视频生成模型迈出了重要一步。