We introduce HouseCrafter, a novel approach that can lift a floorplan into a complete large 3D indoor scene (e.g., a house). Our key insight is to adapt a 2D diffusion model, which is trained on web-scale images, to generate consistent multi-view color (RGB) and depth (D) images across different locations of the scene. Specifically, the RGB-D images are generated autoregressively in a batch-wise manner along sampled locations based on the floorplan, where previously generated images are used as condition to the diffusion model to produce images at nearby locations. The global floorplan and attention design in the diffusion model ensures the consistency of the generated images, from which a 3D scene can be reconstructed. Through extensive evaluation on the 3D-Front dataset, we demonstrate that HouseCraft can generate high-quality house-scale 3D scenes. Ablation studies also validate the effectiveness of different design choices. We will release our code and model weights. Project page: https://neu-vi.github.io/houseCrafter/
翻译:本文提出HouseCrafter,一种能够将平面图转化为完整大规模3D室内场景(如房屋)的创新方法。我们的核心思路是适配基于网络规模图像训练的2D扩散模型,使其能根据场景不同位置生成具有一致性的多视角彩色(RGB)与深度(D)图像。具体而言,RGB-D图像以批处理方式沿平面图采样位置进行自回归生成,其中先前生成的图像将作为扩散模型的条件来生成相邻位置的图像。扩散模型中的全局平面图约束与注意力机制设计确保了生成图像的一致性,从而可重建出3D场景。通过在3D-Front数据集上的大量评估,我们证明HouseCrafter能够生成高质量的房屋级3D场景。消融实验也验证了不同设计选择的有效性。我们将公开代码与模型权重。项目页面:https://neu-vi.github.io/houseCrafter/