Large Language Models (LLMs) possess general world knowledge but often struggle to generate precise predictions in structured, domain-specific contexts such as simulations. These limitations arise from their inability to ground their broad, unstructured understanding in specific environments. To address this, we present WorldLLM, a framework that enhances LLM-based world modeling by combining Bayesian inference and autonomous active exploration with reinforcement learning. WorldLLM leverages the in-context learning abilities of LLMs to guide an LLM-based world model's predictions using natural language hypotheses given in its prompt. These hypotheses are iteratively refined through a Bayesian inference framework that leverages a second LLM as the proposal distribution given collected evidence. This evidence is collected using a curiosity-driven reinforcement learning policy that explores the environment to find transitions with a low log-likelihood under our LLM-based predictive model using the current hypotheses. By alternating between refining hypotheses and collecting new evidence, our framework autonomously drives continual improvement of the predictions. Our experiments demonstrate the effectiveness of WorldLLM in a textual game environment that requires agents to manipulate and combine objects. The framework not only enhances predictive accuracy, but also generates human-interpretable theories of environment dynamics.
翻译:大语言模型(LLMs)具备通用世界知识,但在结构化、特定领域的情境(如模拟仿真)中往往难以生成精确预测。这些局限性源于其无法将广泛、非结构化的理解锚定于特定环境。为解决此问题,我们提出WorldLLM框架,该框架通过结合贝叶斯推断、自主主动探索与强化学习,增强了基于LLM的世界建模能力。WorldLLM利用LLM的上下文学习能力,通过提示中给定的自然语言假设来指导基于LLM的世界模型进行预测。这些假设通过贝叶斯推断框架进行迭代优化,该框架利用第二个LLM作为基于已收集证据的提议分布。证据的收集采用好奇心驱动的强化学习策略,该策略通过探索环境来寻找在当前假设下基于LLM的预测模型中具有低对数似然度的状态转移。通过交替进行假设优化与新证据收集,本框架能够自主驱动预测能力的持续提升。我们在需要智能体操作与组合对象的文本游戏环境中进行了实验,结果验证了WorldLLM的有效性。该框架不仅提升了预测准确性,还能生成人类可理解的环境动态理论。