We introduce \textit{WonderVerse}, a simple but effective framework for generating extendable 3D scenes. Unlike existing methods that rely on iterative depth estimation and image inpainting, often leading to geometric distortions and inconsistencies, WonderVerse leverages the powerful world-level priors embedded within video generative foundation models to create highly immersive and geometrically coherent 3D environments. Furthermore, we propose a new technique for controllable 3D scene extension to substantially increase the scale of the generated environments. Besides, we introduce a novel abnormal sequence detection module that utilizes camera trajectory to address geometric inconsistency in the generated videos. Finally, WonderVerse is compatible with various 3D reconstruction methods, allowing both efficient and high-quality generation. Extensive experiments on 3D scene generation demonstrate that our WonderVerse, with an elegant and simple pipeline, delivers extendable and highly-realistic 3D scenes, markedly outperforming existing works that rely on more complex architectures.
翻译:本文提出\textit{WonderVerse},一个简洁而有效的可扩展三维场景生成框架。与现有依赖迭代深度估计和图像修复、常导致几何失真与不一致的方法不同,WonderVerse利用视频生成基础模型中嵌入的强大世界级先验,创造出高度沉浸且几何一致的三维环境。此外,我们提出一种可控三维场景扩展新技术,以显著增大生成环境的规模。同时,我们引入一种新颖的异常序列检测模块,利用相机轨迹来解决生成视频中的几何不一致问题。最后,WonderVerse兼容多种三维重建方法,支持高效且高质量的生成。在三维场景生成上的大量实验表明,我们的WonderVerse凭借优雅简洁的流程,能够生成可扩展且高度逼真的三维场景,显著优于依赖更复杂架构的现有工作。