World-model synthesis aims to turn interaction experience into an internal model of environment dynamics. Existing symbolic approaches often fit observed transitions or mixtures of local rules, but they do not produce a complete executable program that can run independently of the real environment. We present Mind-Studio, a framework that synthesizes executable pygame-style world models from state-action-next-state trajectories using large language models. Mind-Studio combines entropy-selected traces with a lightweight game skill file containing object, action, and static scene information extracted from screenshots. We evaluate synthesis quality with a K-step lookahead fidelity protocol that compares generated world-model rollouts against Real-ALE rollouts from the same state. On Montezuma's Revenge, Mind-Studio improves chosen-action next-state prediction from 0.3% for PoE-World to 48.7% while verifying 5 of 8 subgoals; across Alien, Assault, and Skiing, it achieves stronger branch-level fidelity than prior learned lookahead sources.
翻译:世界模型合成旨在将交互经验转化为环境动态的内部模型。现有符号方法通常拟合观测到的转移或局部规则混合体,但无法生成可独立于真实环境运行的完整可执行程序。我们提出Mind-Studio框架,该框架利用大语言模型从状态-动作-下一状态轨迹中合成可执行的pygame风格世界模型。Mind-Studio将熵筛选轨迹与轻量级游戏技能文件相结合,该文件包含从截图中提取的对象、动作及静态场景信息。我们采用K步前瞻保真度协议评估合成质量,该协议将生成的世界模型推演结果与相同初始状态下的Real-ALE推演结果进行对比。在《Montezuma's Revenge》游戏中,Mind-Studio将选定动作的下一状态预测准确率从PoE-World的0.3%提升至48.7%,并验证了8个子目标中的5个;在《Alien》、《Assault》和《Skiing》游戏中,其分支级保真度优于先前学习的前瞻源方法。