Planning and performing interactive tasks, such as conducting experiments to determine the melting point of an unknown substance, is straightforward for humans but poses significant challenges for autonomous agents. We introduce ReasonPlanner, a novel generalist agent designed for reflective thinking, planning, and interactive reasoning. This agent leverages LLMs to plan hypothetical trajectories by building a World Model based on a Temporal Knowledge Graph. The agent interacts with the environment using a natural language actor-critic module, where the actor translates the imagined trajectory into a sequence of actionable steps, and the critic determines if replanning is necessary. ReasonPlanner significantly outperforms previous state-of-the-art prompting-based methods on the ScienceWorld benchmark by more than 1.8 times, while being more sample-efficient and interpretable. It relies solely on frozen weights thus requiring no gradient updates. ReasonPlanner can be deployed and utilized without specialized knowledge of Machine Learning, making it accessible to a wide range of users.
翻译:规划并执行交互式任务(例如通过实验测定未知物质的熔点)对人类而言是直观的,但对自主智能体却构成显著挑战。本文提出ReasonPlanner,一种专为反思性思维、规划与交互推理设计的新型通用智能体。该智能体利用大型语言模型,通过构建基于时序知识图谱的世界模型来规划假设轨迹。智能体通过自然语言执行器-评判器模块与环境交互:执行器将想象轨迹转化为可执行步骤序列,评判器则判断是否需要重新规划。在ScienceWorld基准测试中,ReasonPlanner以超过1.8倍的性能显著优于此前基于提示的先进方法,同时具备更高的样本效率与可解释性。该方法完全依赖冻结权重,无需梯度更新。ReasonPlanner的部署与使用无需机器学习专业知识,可为广泛用户群体提供便捷服务。