We address the challenge of recovering an underlying scene geometry and colors from a sparse set of RGBD view observations. In this work, we present a new solution termed RGBD$^2$ that sequentially generates novel RGBD views along a camera trajectory, and the scene geometry is simply the fusion result of these views. More specifically, we maintain an intermediate surface mesh used for rendering new RGBD views, which subsequently becomes complete by an inpainting network; each rendered RGBD view is later back-projected as a partial surface and is supplemented into the intermediate mesh. The use of intermediate mesh and camera projection helps solve the tough problem of multi-view inconsistency. We practically implement the RGBD inpainting network as a versatile RGBD diffusion model, which is previously used for 2D generative modeling; we make a modification to its reverse diffusion process to enable our use. We evaluate our approach on the task of 3D scene synthesis from sparse RGBD inputs; extensive experiments on the ScanNet dataset demonstrate the superiority of our approach over existing ones. Project page: https://jblei.site/proj/rgbd-diffusion.
翻译:我们针对从稀疏RGBD视角观测中恢复底层场景几何与颜色这一挑战性问题,提出名为RGBD$^2$的新方案。该方法沿相机轨迹顺序生成新颖RGBD视角,场景几何即由这些视角的融合结果构成。具体而言,我们维护一个用于渲染新RGBD视角的中间曲面网格,随后通过修复网络使其逐步完整;每个渲染得到的RGBD视角被反向投影为局部曲面并补充到中间网格中。中间网格与相机投影机制的引入有效解决了多视角一致性的难题。我们实际将RGBD修复网络实现为多功能的RGBD扩散模型(该模型此前用于二维生成建模),并对其反向扩散过程进行改进以适配本应用场景。在稀疏RGBD输入的三维场景合成任务中,我们在ScanNet数据集上的大量实验证明该方法相较现有技术的优越性。项目页面:https://jblei.site/proj/rgbd-diffusion。