With the widespread usage of VR devices and contents, demands for 3D scene generation techniques become more popular. Existing 3D scene generation models, however, limit the target scene to specific domain, primarily due to their training strategies using 3D scan dataset that is far from the real-world. To address such limitation, we propose LucidDreamer, a domain-free scene generation pipeline by fully leveraging the power of existing large-scale diffusion-based generative model. Our LucidDreamer has two alternate steps: Dreaming and Alignment. First, to generate multi-view consistent images from inputs, we set the point cloud as a geometrical guideline for each image generation. Specifically, we project a portion of point cloud to the desired view and provide the projection as a guidance for inpainting using the generative model. The inpainted images are lifted to 3D space with estimated depth maps, composing a new points. Second, to aggregate the new points into the 3D scene, we propose an aligning algorithm which harmoniously integrates the portions of newly generated 3D scenes. The finally obtained 3D scene serves as initial points for optimizing Gaussian splats. LucidDreamer produces Gaussian splats that are highly-detailed compared to the previous 3D scene generation methods, with no constraint on domain of the target scene. Project page: https://luciddreamer-cvlab.github.io/
翻译:随着VR设备与内容的广泛普及,对3D场景生成技术的需求日益增长。然而现有3D场景生成模型受限于特定领域,这主要源于其采用与真实世界差异较大的3D扫描数据集进行训练的策略。为解决该局限,我们提出LucidDreamer,一种通过充分利用现有大规模扩散生成模型能力的无领域限制场景生成管线。我们的LucidDreamer包含两个交替步骤:梦境生成与对齐。首先,为从输入生成多视角一致图像,我们将点云设为每幅图像生成的几何指导。具体而言,我们将部分点云投影到目标视角,并将该投影作为引导信息,利用生成模型进行图像修复。修复后的图像通过估计深度图提升至3D空间,形成新的点云。其次,为将新点云聚合到3D场景中,我们提出一种对齐算法,该算法能和谐地整合新生成3D场景的各个部分。最终获得的3D场景作为优化高斯泼溅的初始点。与先前3D场景生成方法相比,LucidDreamer生成的高斯泼溅具有高度细节,且不受目标场景领域的限制。项目主页:https://luciddreamer-cvlab.github.io/