In this work we propose a novel method for unsupervised controllable video generation. Once trained on a dataset of unannotated videos, at inference our model is capable of both composing scenes of predefined object parts and animating them in a plausible and controlled way. This is achieved by conditioning video generation on a randomly selected subset of local pre-trained self-supervised features during training. We call our model CAGE for visual Composition and Animation for video GEneration. We conduct a series of experiments to demonstrate capabilities of CAGE in various settings. Project website: https://araachie.github.io/cage.
翻译:本文提出了一种新颖的无监督可控视频生成方法。在未标注视频数据集上完成训练后,我们的模型在推理阶段既能组合预定义物体部件的场景,又能以合理且可控的方式为其添加动画效果。这一成果通过在训练过程中将视频生成条件设定为随机选取的局部预训练自监督特征子集来实现。我们将该模型命名为CAGE(视觉组合与动画视频生成框架)。我们开展了一系列实验,在不同设置下展示了CAGE的能力。项目网站:https://araachie.github.io/cage。