Large language models (LLMs) provide a promising tool that enable robots to perform complex robot reasoning tasks. However, the limited context window of contemporary LLMs makes reasoning over long time horizons difficult. Embodied tasks such as those that one might expect a household robot to perform typically require that the planner consider information acquired a long time ago (e.g., properties of the many objects that the robot previously encountered in the environment). Attempts to capture the world state using an LLM's implicit internal representation is complicated by the paucity of task- and environment-relevant information available in a robot's action history, while methods that rely on the ability to convey information via the prompt to the LLM are subject to its limited context window. In this paper, we propose Statler, a framework that endows LLMs with an explicit representation of the world state as a form of ``memory'' that is maintained over time. Integral to Statler is its use of two instances of general LLMs -- a world-model reader and a world-model writer -- that interface with and maintain the world state. By providing access to this world state ``memory'', Statler improves the ability of existing LLMs to reason over longer time horizons without the constraint of context length. We evaluate the effectiveness of our approach on three simulated table-top manipulation domains and a real robot domain, and show that it improves the state-of-the-art in LLM-based robot reasoning. Project website: https://statler-lm.github.io/
翻译:摘要:大型语言模型(LLMs)为机器人执行复杂推理任务提供了有前景的工具。然而,当代LLMs有限的上下文窗口使其难以在长时间跨度内进行推理。家庭机器人可能执行的具身任务通常要求规划器考虑早期获取的信息(例如,机器人先前在环境中遇到的众多物体的属性)。试图通过LLMs隐式内部表征捕捉世界状态的方法,受限于机器人动作历史中任务与环境相关信息的匮乏;而依赖通过提示向LLMs传递信息的方法则受限于其有限的上下文窗口。本文提出Statler框架,该框架为LLMs配备一种随时间维持的显式世界状态表征作为"记忆"。Statler的核心在于使用两个通用LLM实例——世界模型读取器与世界模型写入器——分别与世界状态进行交互与维护。通过提供对此世界状态"记忆"的访问,Statler增强了现有LLMs在不受上下文长度限制下进行长时间跨度推理的能力。我们在三个模拟桌面操作域和一个真实机器人域上评估了该方法的效果,结果表明其提升了基于LLM的机器人推理的当前最优水平。项目网站:https://statler-lm.github.io/