Achieving spatial intelligence requires moving beyond visual plausibility to build world simulators grounded in physical laws. While coding LLMs have advanced static 3D scene generation, extending this paradigm to 4D dynamics remains a critical frontier. This task presents two fundamental challenges: multi-scale context entanglement, where monolithic generation fails to balance local object structures with global environmental layouts; and a semantic-physical execution gap, where open-loop code generation leads to physical hallucinations lacking dynamic fidelity. We introduce Code2Worlds, a framework that formulates 4D generation as language-to-simulation code generation. First, we propose a dual-stream architecture that disentangles retrieval-augmented object generation from hierarchical environmental orchestration. Second, to ensure dynamic fidelity, we establish a physics-aware closed-loop mechanism in which a PostProcess Agent scripts dynamics, coupled with a VLM-Motion Critic that performs self-reflection to iteratively refine simulation code. Evaluations on the Code4D benchmark show Code2Worlds outperforms baselines with a 41% SGS gain and 49% higher Richness, while uniquely generating physics-aware dynamics absent in prior static methods. Code: https://github.com/AIGeeksGroup/Code2Worlds. Website: https://aigeeksgroup.github.io/Code2Worlds.
翻译:实现空间智能需要超越视觉合理性,构建基于物理定律的世界模拟器。尽管编码大语言模型已推进了静态三维场景生成,但将此范式扩展至四维动态生成仍是一个关键前沿。该任务面临两大根本挑战:多尺度上下文纠缠,即单一生成方法难以平衡局部物体结构与全局环境布局;以及语义-物理执行鸿沟,即开环代码生成会导致缺乏动态保真度的物理幻觉。我们提出Code2Worlds框架,将四维生成构建为语言到模拟代码的生成任务。首先,我们设计了一种双流架构,将检索增强的物体生成与分层环境编排进行解耦。其次,为确保动态保真度,我们建立了物理感知的闭环机制:其中后处理代理编写动态脚本,并耦合一个VLM-运动批判器执行自反思以迭代优化模拟代码。在Code4D基准上的评估表明,Code2Worlds以41%的SGS增益和49%更高的丰富度超越基线方法,并能生成先前静态方法所缺失的物理感知动态。代码:https://github.com/AIGeeksGroup/Code2Worlds。项目网站:https://aigeeksgroup.github.io/Code2Worlds。