Current Large Language Models (LLMs) exhibit a critical modal disconnect: they possess vast semantic knowledge but lack the procedural grounding to respect the immutable laws of the physical world. Consequently, while these agents implicitly function as world models, their simulations often suffer from physical hallucinations-generating plans that are logically sound but physically unexecutable. Existing alignment strategies predominantly rely on resource-intensive training or fine-tuning, which attempt to compress dynamic environmental rules into static model parameters. However, such parametric encapsulation is inherently rigid, struggling to adapt to the open-ended variability of physical dynamics without continuous, costly retraining. To bridge this gap, we introduce WorldMind, a framework that autonomously constructs a symbolic World Knowledge Repository by synthesizing environmental feedback. Specifically, it unifies Process Experience to enforce physical feasibility via prediction errors and Goal Experience to guide task optimality through successful trajectories. Experiments on EB-ALFRED and EB-Habitat demonstrate that WorldMind achieves superior performance compared to baselines with remarkable cross-model and cross-environment transferability.
翻译:当前的大型语言模型(LLMs)存在关键的模态割裂问题:它们拥有海量语义知识,但缺乏遵循物理世界不变法则的程序性基础。因此,尽管这些智能体隐式地充当世界模型,其模拟过程常出现物理幻觉——生成的计划在逻辑上合理但物理上不可执行。现有的对齐策略主要依赖资源密集的训练或微调,试图将动态环境规则压缩为静态模型参数。然而,这种参数化封装本质上是僵化的,难以适应物理动态的开放可变性,且需持续进行成本高昂的重新训练。为弥合这一差距,我们提出了WorldMind框架,该框架通过综合环境反馈自主构建符号化世界知识库。具体而言,它统一了通过预测误差强化物理可行性的过程经验,以及通过成功轨迹引导任务最优性的目标经验。在EB-ALFRED和EB-Habitat上的实验表明,WorldMind相较于基线方法实现了更优的性能,并展现出卓越的跨模型与跨环境可迁移性。