Photorealistic 3D scene generation is challenging due to the scarcity of large-scale, high-quality real-world 3D datasets and complex workflows requiring specialized expertise for manual modeling. These constraints often result in slow iteration cycles, where each modification demands substantial effort, ultimately stifling creativity. We propose a fast, exemplar-driven framework for generating 3D scenes from a single casual input, such as handheld video or drone footage. Our method first leverages 3D Gaussian Splatting (3DGS) to robustly reconstruct input scenes with a high-quality 3D appearance model. We then train a per-scene Generative Cellular Automaton (GCA) to produce a sparse volume of featurized voxels, effectively amortizing scene generation while enabling controllability. A subsequent patch-based remapping step composites the complete scene from the exemplar's initial 3D Gaussian splats, successfully recovering the appearance statistics of the input scene. The entire pipeline can be trained in less than 10 minutes for each exemplar and generates scenes in 0.5-2 seconds. Our method enables interactive creation with full user control, and we showcase complex 3D generation results from real-world exemplars within a self-contained interactive GUI.
翻译:逼真三维场景生成面临两大挑战:大规模高质量真实世界三维数据集的稀缺性,以及需要专业手动建模技能的复杂工作流程。这些限制通常导致迭代周期缓慢,每次修改都需要大量人工投入,最终抑制了创作效率。本文提出一种基于示例的快速框架,能够从单段随意拍摄的输入(如手持视频或无人机影像)生成三维场景。该方法首先利用三维高斯泼溅(3DGS)对输入场景进行鲁棒重建,获得高质量三维外观模型。随后训练一个场景特定的生成式细胞自动机(GCA)来生成特征化体素的稀疏体数据,在实现可控性的同时有效分摊场景生成成本。后续基于图像块的重新映射步骤通过组合示例初始三维高斯泼溅数据来合成完整场景,成功复现了输入场景的外观统计特性。整个流程对每个示例的训练时间少于10分钟,场景生成仅需0.5-2秒。本方法支持具备完整用户控制能力的交互式创作,并在自包含的交互式图形界面中展示了基于真实世界示例的复杂三维生成效果。