World-model synthesis aims to turn interaction experience into an internal model of environment dynamics. Existing symbolic approaches often fit observed transitions or mixtures of local rules, but they do not produce a complete executable program that can run independently of the real environment. We present Mind-Studio, a framework that synthesizes executable pygame-style world models from state-action-next-state trajectories using large language models. Mind-Studio combines entropy-selected traces with a lightweight game skill file containing object, action, and static scene information extracted from screenshots. We evaluate synthesis quality with a K-step lookahead fidelity protocol that compares generated world-model rollouts against Real-ALE rollouts from the same state. On Montezuma's Revenge, Mind-Studio improves chosen-action next-state prediction from 0.3% for PoE-World to 48.7% while verifying 5 of 8 subgoals; across Alien, Assault, and Skiing, it achieves stronger branch-level fidelity than prior learned lookahead sources.
翻译:世界模型合成的目标是将交互经验转化为环境动力学的内部模型。现有符号方法通常拟合观测到的状态转移或局部规则混合体,但未能生成可在真实环境之外独立运行的完整可执行程序。我们提出Mind-Studio框架,该框架利用大型语言模型,从状态-动作-下一状态轨迹中合成可执行的pygame风格世界模型。Mind-Studio将熵筛选轨迹与包含从屏幕截图中提取的对象、动作和静态场景信息的轻量级游戏技能文件相结合。我们通过K步前向保真度协议评估合成质量,该协议比较同一状态下生成的世界模型 rollout 与Real-ALE rollout 的差异。在Montezuma's Revenge游戏中,Mind-Studio将选定动作的下一状态预测准确率从PoE-World的0.3%提升至48.7%,同时验证了8个子目标中的5个;在Alien、Assault和Skiing游戏中,其分支级保真度均优于先前学习的向前评估源。