We present WonderWorld, a novel framework for interactive 3D scene extrapolation that enables users to explore and shape virtual environments based on a single input image and user-specified text. While significant improvements have been made to the visual quality of scene generation, existing methods are run offline, taking tens of minutes to hours to generate a scene. By leveraging Fast Gaussian Surfels and a guided diffusion-based depth estimation method, WonderWorld generates geometrically consistent extrapolation while significantly reducing computational time. Our framework generates connected and diverse 3D scenes in less than 10 seconds on a single A6000 GPU, enabling real-time user interaction and exploration. We demonstrate the potential of WonderWorld for applications in virtual reality, gaming, and creative design, where users can quickly generate and navigate immersive, potentially infinite virtual worlds from a single image. Our approach represents a significant advancement in interactive 3D scene generation, opening up new possibilities for user-driven content creation and exploration in virtual environments. We will release full code and software for reproducibility. Project website: https://WonderWorld-2024.github.io/
翻译:我们提出了WonderWorld,一种新颖的交互式三维场景外推框架,使用户能够基于单张输入图像和用户指定的文本来探索和塑造虚拟环境。尽管场景生成的视觉质量已取得显著改进,但现有方法均为离线运行,需要数十分钟到数小时才能生成一个场景。通过利用快速高斯面元技术和基于引导扩散的深度估计方法,WonderWorld在显著减少计算时间的同时,生成几何一致的外推场景。我们的框架在单张A6000 GPU上能在10秒内生成连通且多样化的三维场景,实现了实时的用户交互与探索。我们展示了WonderWorld在虚拟现实、游戏和创意设计中的应用潜力,用户能够从单张图像快速生成并导航沉浸式、可能无限的虚拟世界。我们的方法代表了交互式三维场景生成领域的重大进展,为用户驱动的虚拟环境内容创作与探索开辟了新的可能性。我们将发布完整代码和软件以确保可复现性。项目网站:https://WonderWorld-2024.github.io/