Synthesizing large-scale, explorable, and geometrically accurate 3D urban scenes is a challenging yet valuable task in providing immersive and embodied applications. The challenges lie in the lack of large-scale and high-quality real-world 3D scans for training generalizable generative models. In this paper, we take an alternative route to create large-scale 3D scenes by synergizing the readily available satellite imagery that supplies realistic coarse geometry and the open-domain diffusion model for creating high-quality close-up appearances. We propose \textbf{Skyfall-GS}, the first city-block scale 3D scene creation framework without costly 3D annotations, also featuring real-time, immersive 3D exploration. We tailor a curriculum-driven iterative refinement strategy to progressively enhance geometric completeness and photorealistic textures. Extensive experiments demonstrate that Skyfall-GS provides improved cross-view consistent geometry and more realistic textures compared to state-of-the-art approaches. Project page: https://skyfall-gs.jayinnn.dev/
翻译:合成大规模、可探索且几何精确的三维城市场景,对于提供沉浸式与具身体验应用而言,是一项具有挑战性但极具价值的任务。其挑战主要在于缺乏用于训练通用生成模型的大规模高质量真实世界三维扫描数据。本文提出一种替代路径来创建大规模三维场景:通过协同利用易于获取的卫星图像(提供真实的粗略几何信息)与开放域扩散模型(生成高质量近景外观),实现场景构建。我们提出了 \textbf{Skyfall-GS},这是首个无需昂贵三维标注即可创建城市街区尺度三维场景的框架,并具备实时、沉浸式的三维探索能力。我们设计了一种课程驱动的迭代优化策略,以逐步提升几何完整性与照片级真实感纹理。大量实验表明,与现有先进方法相比,Skyfall-GS 能提供更优的跨视角几何一致性以及更逼真的纹理效果。项目页面:https://skyfall-gs.jayinnn.dev/