Video generation has achieved rapid progress benefiting from high-quality renderings provided by powerful image generators. We regard the video synthesis task as generating a sequence of images sharing the same contents but varying in motions. However, most previous video synthesis frameworks based on pre-trained image generators treat content and motion generation separately, leading to unrealistic generated videos. Therefore, we design a novel framework to build the motion space, aiming to achieve content consistency and fast convergence for video generation. We present MotionVideoGAN, a novel video generator synthesizing videos based on the motion space learned by pre-trained image pair generators. Firstly, we propose an image pair generator named MotionStyleGAN to generate image pairs sharing the same contents and producing various motions. Then we manage to acquire motion codes to edit one image in the generated image pairs and keep the other unchanged. The motion codes help us edit images within the motion space since the edited image shares the same contents with the other unchanged one in image pairs. Finally, we introduce a latent code generator to produce latent code sequences using motion codes for video generation. Our approach achieves state-of-the-art performance on the most complex video dataset ever used for unconditional video generation evaluation, UCF101.
翻译:视频生成技术得益于强大图像生成器提供的高质量渲染而取得了快速发展。我们将视频合成任务视为生成一系列内容相同、运动状态不同的图像序列。然而,现有基于预训练图像生成器的视频合成框架大多将内容生成与运动生成分离处理,导致生成视频失真。为此,我们设计了一种构建运动空间的新框架,旨在实现视频生成中的内容一致性与快速收敛。本文提出MotionVideoGAN——一种基于预训练图像对生成器所学习运动空间的新型视频生成器。首先,我们构建名为MotionStyleGAN的图像对生成器,用于生成内容一致但运动模式各异的图像对。其次,通过获取运动编码对图像对中的单张图像进行编辑,同时保持另一张图像不变。由于编辑后的图像与图像对中另一张未修改图像共享相同内容,运动编码可帮助我们在运动空间内实现图像编辑。最后,我们引入潜码生成器,利用运动编码生成用于视频合成的潜码序列。该方法在用于无条件视频生成评估的最复杂数据集UCF101上取得了业界最佳性能。