3D world generation is essential for applications such as immersive content creation or autonomous driving simulation. Recent advances in 3D world generation have shown promising results; however, these methods are constrained by grid layouts and suffer from inconsistencies in object scale throughout the entire world. In this work, we introduce a novel framework, Map2World, that first enables 3D world generation conditioned on user-defined segment maps of arbitrary shapes and scales, ensuring global-scale consistency and flexibility across expansive environments. To further enhance the quality, we propose a detail enhancer network that generates fine details of the world. The detail enhancer enables the addition of fine-grained details without compromising overall scene coherence by incorporating global structure information. We design the entire pipeline to leverage strong priors from asset generators, achieving robust generalization across diverse domains, even under limited training data for scene generation. Extensive experiments demonstrate that our method significantly outperforms existing approaches in user-controllability, scale consistency, and content coherence, enabling users to generate 3D worlds under more complex conditions.
翻译:三维世界生成对于沉浸式内容创作或自动驾驶仿真等应用至关重要。近年来三维世界生成技术已取得显著进展,但现有方法受限于网格布局,并在整个世界中存在物体尺度不一致的问题。本研究提出新框架Map2World,首次实现基于用户定义任意形状与尺度分割图的三维世界条件化生成,确保大规模场景下的全局尺度一致性与灵活性。为提升生成质量,我们提出细节增强网络以生成世界细节:该网络通过融合全局结构信息,在保持场景整体连贯性的同时添加精细细节。我们设计完整流水线以利用资产生成器的强大先验知识,在场景生成训练数据有限的情况下仍能实现跨领域的稳健泛化。大量实验表明,本方法在用户可控性、尺度一致性与内容连贯性上显著优于现有方法,使用户能在更复杂条件下生成三维世界。