Large language models (LLMs) are being applied as actors for sequential decision making tasks in domains such as robotics and games, utilizing their general world knowledge and planning abilities. However, previous work does little to explore what environment state information is provided to LLM actors via language. Exhaustively describing high-dimensional states can impair performance and raise inference costs for LLM actors. Previous LLM actors avoid the issue by relying on hand-engineered, task-specific protocols to determine which features to communicate about a state and which to leave out. In this work, we propose Brief Language INputs for DEcision-making Responses (BLINDER), a method for automatically selecting concise state descriptions by learning a value function for task-conditioned state descriptions. We evaluate BLINDER on the challenging video game NetHack and a robotic manipulation task. Our method improves task success rate, reduces input size and compute costs, and generalizes between LLM actors.
翻译:大型语言模型(LLMs)正被应用于机器人和游戏等领域的顺序决策任务,其通用世界知识和规划能力得以发挥。然而,先前的工作很少探索通过语言向LLM角色提供哪些环境状态信息。对高维状态进行详尽描述会损害LLM角色的性能并增加推理成本。以往的LLM角色通过依赖手工设计的、特定任务的协议来确定应传达哪些状态特征以及忽略哪些特征,从而避免该问题。在本工作中,我们提出了一种名为BLINDER(用于决策响应的简洁语言输入)的方法,该方法通过学习面向任务条件状态描述的价值函数,自动选择简洁的状态描述。我们在具有挑战性的视频游戏NetHack和一项机器人操控任务上评估了BLINDER。我们的方法提高了任务成功率,减少了输入大小和计算成本,并能在不同LLM角色间进行泛化。