Generative models have emerged as an essential building block for many image synthesis and editing tasks. Recent advances in this field have also enabled high-quality 3D or video content to be generated that exhibits either multi-view or temporal consistency. With our work, we explore 4D generative adversarial networks (GANs) that learn unconditional generation of 3D-aware videos. By combining neural implicit representations with time-aware discriminator, we develop a GAN framework that synthesizes 3D video supervised only with monocular videos. We show that our method learns a rich embedding of decomposable 3D structures and motions that enables new visual effects of spatio-temporal renderings while producing imagery with quality comparable to that of existing 3D or video GANs.
翻译:生成模型已成为许多图像合成和编辑任务的基本构建模块。该领域的最新进展还使得能够生成具有多视角一致性或时间一致性的高质量三维或视频内容。在本研究中,我们探索了学习无监督生成三维感知视频的四维生成对抗网络(4D GANs)。通过将神经隐式表示与时域判别器相结合,我们开发了一个仅以单目视频为监督信号即可合成三维视频的GAN框架。我们证明,该方法能够学习可分解的三维结构和运动信息的丰富嵌入表示,在实现时空渲染新视觉效果的同时,生成与现有三维或视频GANs质量相当的图像。