We present CAT4D, a method for creating 4D (dynamic 3D) scenes from monocular video. CAT4D leverages a multi-view video diffusion model trained on a diverse combination of datasets to enable novel view synthesis at any specified camera poses and timestamps. Combined with a novel sampling approach, this model can transform a single monocular video into a multi-view video, enabling robust 4D reconstruction via optimization of a deformable 3D Gaussian representation. We demonstrate competitive performance on novel view synthesis and dynamic scene reconstruction benchmarks, and highlight the creative capabilities for 4D scene generation from real or generated videos. See our project page for results and interactive demos: \url{cat-4d.github.io}.
翻译:本文提出CAT4D,一种从单目视频创建四维(动态三维)场景的方法。CAT4D利用在多样化数据集组合上训练的多视角视频扩散模型,能够在任意指定相机位姿和时间戳下进行新视角合成。结合新颖的采样策略,该模型可将单目视频转换为多视角视频,从而通过优化可变形三维高斯表示实现鲁棒的四维重建。我们在新视角合成和动态场景重建基准测试中展示了具有竞争力的性能,并突显了从真实或生成视频进行四维场景生成的创造性能力。结果和交互式演示请参见项目页面:\url{cat-4d.github.io}。