We present DiffRoom, a novel framework for tackling the problem of high-quality 3D indoor room reconstruction and generation, both of which are challenging due to the complexity and diversity of the room geometry. Although diffusion-based generative models have previously demonstrated impressive performance in image generation and object-level 3D generation, they have not yet been applied to room-level 3D generation due to their computationally intensive costs. In DiffRoom, we propose a sparse 3D diffusion network that is efficient and possesses strong generative performance for Truncated Signed Distance Field (TSDF), based on a rough occupancy prior. Inspired by KinectFusion's incremental alignment and fusion of local SDFs, we propose a diffusion-based TSDF fusion approach that iteratively diffuses and fuses TSDFs, facilitating the reconstruction and generation of an entire room environment. Additionally, to ease training, we introduce a curriculum diffusion learning paradigm that speeds up the training convergence process and enables high-quality reconstruction. According to the user study, the mesh quality generated by our DiffRoom can even outperform the ground truth mesh provided by ScanNet. Please visit our project page for the latest progress and demonstrations: https://akirahero.github.io/DiffRoom/.
翻译:我们提出了DiffRoom,一个针对高质量三维室内场景重建与生成问题的新颖框架。由于室内几何结构的复杂性和多样性,这两个任务均具有挑战性。尽管基于扩散的生成模型在图像生成和物体级三维生成中已展现出令人瞩目的性能,但由于其计算成本高昂,尚未被应用于室内场景级三维生成。在DiffRoom中,我们提出了一个基于粗略占据先验的稀疏三维扩散网络,该网络高效且对截断符号距离场(TSDF)具有强大的生成能力。受KinectFusion增量对齐与融合局部SDF的启发,我们提出了一种基于扩散的TSDF融合方法,通过迭代扩散与融合TSDF,促进整个室内环境的重建与生成。此外,为简化训练过程,我们引入了一种课程扩散学习范式,以加速训练收敛并实现高质量重建。根据用户研究,我们的DiffRoom生成的网格质量甚至可以超越ScanNet提供的真实网格。最新进展和演示请访问项目页面:https://akirahero.github.io/DiffRoom/。