WorldLLM: Improving LLMs' world modeling using curiosity-driven theory-making

Large Language Models (LLMs) possess general world knowledge but often struggle to generate precise predictions in structured, domain-specific contexts such as simulations. These limitations arise from their inability to ground their broad, unstructured understanding in specific environments. To address this, we present WorldLLM, a framework that enhances LLM-based world modeling by combining Bayesian inference and autonomous active exploration with reinforcement learning. WorldLLM leverages the in-context learning abilities of LLMs to guide an LLM-based world model's predictions using natural language hypotheses given in its prompt. These hypotheses are iteratively refined through a Bayesian inference framework that leverages a second LLM as the proposal distribution given collected evidence. This evidence is collected using a curiosity-driven reinforcement learning policy that explores the environment to find transitions with a low log-likelihood under our LLM-based predictive model using the current hypotheses. By alternating between refining hypotheses and collecting new evidence, our framework autonomously drives continual improvement of the predictions. Our experiments demonstrate the effectiveness of WorldLLM in a textual game environment that requires agents to manipulate and combine objects. The framework not only enhances predictive accuracy, but also generates human-interpretable theories of environment dynamics.

翻译：大语言模型（LLMs）具备通用世界知识，但在结构化、特定领域的情境（如模拟仿真）中往往难以生成精确预测。这些局限性源于其无法将广泛、非结构化的理解锚定于特定环境。为解决此问题，我们提出WorldLLM框架，该框架通过结合贝叶斯推断、自主主动探索与强化学习，增强了基于LLM的世界建模能力。WorldLLM利用LLM的上下文学习能力，通过提示中给定的自然语言假设来指导基于LLM的世界模型进行预测。这些假设通过贝叶斯推断框架进行迭代优化，该框架利用第二个LLM作为基于已收集证据的提议分布。证据的收集采用好奇心驱动的强化学习策略，该策略通过探索环境来寻找在当前假设下基于LLM的预测模型中具有低对数似然度的状态转移。通过交替进行假设优化与新证据收集，本框架能够自主驱动预测能力的持续提升。我们在需要智能体操作与组合对象的文本游戏环境中进行了实验，结果验证了WorldLLM的有效性。该框架不仅提升了预测准确性，还能生成人类可理解的环境动态理论。

相关内容

大语言模型

关注 66

大语言模型是基于海量文本数据训练的深度学习模型。它不仅能够生成自然语言文本，还能够深入理解文本含义，处理各种自然语言任务，如文本摘要、问答、翻译等。2023年，大语言模型及其在人工智能领域的应用已成为全球科技研究的热点，其在规模上的增长尤为引人注目，参数量已从最初的十几亿跃升到如今的一万亿。参数量的提升使得模型能够更加精细地捕捉人类语言微妙之处，更加深入地理解人类语言的复杂性。在过去的一年里，大语言模型在吸纳新知识、分解复杂任务以及图文对齐等多方面都有显著提升。随着技术的不断成熟，它将不断拓展其应用范围，为人类提供更加智能化和个性化的服务，进一步改善人们的生活和生产方式。

大型语言模型（LLM）赋能的知识图谱构建：综述

专知会员服务

56+阅读 · 2025年10月24日

PlanGenLLMs：大型语言模型规划能力的最新综述

专知会员服务

33+阅读 · 2025年5月18日

【新书】设计大型语言模型应用：一种面向LLMs的整体方法

专知会员服务

56+阅读 · 2025年3月16日

【新书】解码大型语言模型：理解、实现与优化LLM在自然语言处理应用中的全面指南

专知会员服务

49+阅读 · 2024年12月13日