CityX: Controllable Procedural Content Generation for Unbounded 3D Cities

Urban areas, as the primary human habitat in modern civilization, accommodate a broad spectrum of social activities. With the surge of embodied intelligence, recent years have witnessed an increasing presence of physical agents in urban areas, such as autonomous vehicles and delivery robots. As a result, practitioners significantly value crafting authentic, simulation-ready 3D cities to facilitate the training and verification of such agents. However, this task is quite challenging. Current generative methods fall short in either diversity, controllability, or fidelity. In this work, we resort to the procedural content generation (PCG) technique for high-fidelity generation. It assembles superior assets according to empirical rules, ultimately leading to industrial-grade outcomes. To ensure diverse and self contained creation, we design a management protocol to accommodate extensive PCG plugins with distinct functions and interfaces. Based on this unified PCG library, we develop a multi-agent framework to transform multi-modal instructions, including OSM, semantic maps, and satellite images, into executable programs. The programs coordinate relevant plugins to construct the 3D city consistent with the control condition. A visual feedback scheme is introduced to further refine the initial outcomes. Our method, named CityX, demonstrates its superiority in creating diverse, controllable, and realistic 3D urban scenes. The synthetic scenes can be seamlessly deployed as a real-time simulator and an infinite data generator for embodied intelligence research. Our project page: https://cityx-lab.github.io.

翻译：城市区域作为现代文明中人类的主要栖息地，承载着广泛的社会活动。随着具身智能的兴起，近年来物理智能体（如自动驾驶车辆和配送机器人）在城市环境中的存在日益增加。因此，从业者高度重视构建真实、可用于仿真的三维城市，以促进此类智能体的训练与验证。然而，这项任务极具挑战性。当前的生成方法在多样性、可控性或保真度方面均存在不足。在本工作中，我们采用程序化内容生成（PCG）技术以实现高保真生成。该方法依据经验规则组合优质资产，最终产出工业级结果。为确保多样且自洽的创建过程，我们设计了一套管理协议以兼容具有不同功能与接口的广泛PCG插件。基于此统一的PCG库，我们开发了一个多智能体框架，能够将多模态指令（包括OSM、语义地图和卫星图像）转化为可执行程序。这些程序协调相关插件以构建符合控制条件的三维城市。我们进一步引入视觉反馈机制对初始结果进行优化。本方法命名为CityX，在创建多样化、可控且逼真的三维城市场景方面展现出显著优势。生成的场景可无缝部署为实时仿真器及面向具身智能研究的无限数据生成器。项目页面：https://cityx-lab.github.io。