Advances in 3D reconstruction have enabled high-quality 3D capture, but require a user to collect hundreds to thousands of images to create a 3D scene. We present CAT3D, a method for creating anything in 3D by simulating this real-world capture process with a multi-view diffusion model. Given any number of input images and a set of target novel viewpoints, our model generates highly consistent novel views of a scene. These generated views can be used as input to robust 3D reconstruction techniques to produce 3D representations that can be rendered from any viewpoint in real-time. CAT3D can create entire 3D scenes in as little as one minute, and outperforms existing methods for single image and few-view 3D scene creation. See our project page for results and interactive demos at https://cat3d.github.io .
翻译:三维重建技术的进步已实现高质量的三维捕捉,但用户需采集数百至数千张图像才能构建三维场景。我们提出CAT3D方法,通过多视角扩散模型模拟真实捕捉过程,实现任意物体的三维创建。给定任意数量的输入图像及一组目标新视角,该模型可生成场景高度一致的新视角图像。这些生成视角可作为鲁棒三维重建技术的输入,生成可实时渲染任意视角的三维表示。CAT3D可在短短一分钟内完成完整三维场景的创建,在单图像及少视角三维场景创建任务中均优于现有方法。相关结果与交互式演示请参见项目页面:https://cat3d.github.io。