Large Language Models (LLMs) motivate generative agent simulation (e.g., AI Town) to create a ``dynamic world'', holding immense value across entertainment and research. However, for non-experts, especially those without programming skills, it isn't easy to customize a visualizable environment by themselves. In this paper, we introduce World Craft, an agentic world creation framework to create an executable and visualizable AI Town via user textual descriptions. It consists of two main modules, World Scaffold and World Guild. World Scaffold is a structured and concise standardization to develop interactive game scenes, serving as an efficient scaffolding for LLMs to customize an executable AI Town-like environment. World Guild is a multi-agent framework to progressively analyze users' intents from rough descriptions, and synthesizes required structured contents (\eg environment layout and assets) for World Scaffold . Moreover, we construct a high-quality error-correction dataset via reverse engineering to enhance spatial knowledge and improve the stability and controllability of layout generation, while reporting multi-dimensional evaluation metrics for further analysis. Extensive experiments demonstrate that our framework significantly outperforms existing commercial code agents (Cursor and Antigravity) and LLMs (Qwen3 and Gemini-3-Pro). in scene construction and narrative intent conveyance, providing a scalable solution for the democratization of environment creation.
翻译:大型语言模型(LLM)推动了生成式智能体仿真(例如AI Town)以创建“动态世界”,在娱乐和研究领域具有巨大价值。然而,对于非专业人士,特别是缺乏编程技能的用户而言,自行定制可视化环境并非易事。本文提出World Craft,一种基于用户文本描述创建可执行且可视化AI Town的智能体世界构建框架。该框架包含两个核心模块:World Scaffold与World Guild。World Scaffold是一种结构化、简洁的交互式游戏场景开发标准化方案,为LLM定制可执行的类AI Town环境提供高效脚手架。World Guild是一个多智能体框架,能够从粗略描述中渐进式解析用户意图,并为World Scaffold合成所需的结构化内容(例如环境布局与资源素材)。此外,我们通过逆向工程构建了高质量纠错数据集,以增强空间知识理解,提升布局生成的稳定性与可控性,同时报告多维评估指标以供深入分析。大量实验表明,本框架在场景构建与叙事意图传达方面显著优于现有商业代码智能体(Cursor和Antigravity)及LLM模型(Qwen3和Gemini-3-Pro),为环境创建的民主化提供了可扩展的解决方案。