Recent works successfully leveraged Large Language Models' (LLM) abilities to capture abstract knowledge about world's physics to solve decision-making problems. Yet, the alignment between LLMs' knowledge and the environment can be wrong and limit functional competence due to lack of grounding. In this paper, we study an approach (named GLAM) to achieve this alignment through functional grounding: we consider an agent using an LLM as a policy that is progressively updated as the agent interacts with the environment, leveraging online Reinforcement Learning to improve its performance to solve goals. Using an interactive textual environment designed to study higher-level forms of functional grounding, and a set of spatial and navigation tasks, we study several scientific questions: 1) Can LLMs boost sample efficiency for online learning of various RL tasks? 2) How can it boost different forms of generalization? 3) What is the impact of online learning? We study these questions by functionally grounding several variants (size, architecture) of FLAN-T5.
翻译:近期研究成功利用大型语言模型(LLM)捕捉世界物理抽象知识的能力来解决决策问题。然而,由于缺乏实境落地,LLM知识与环境之间的对齐可能出现错误,从而限制其功能胜任能力。本文研究一种名为GLAM的方法,通过功能落地实现这种对齐:我们考虑一个将LLM作为策略的智能体,该策略随智能体与环境交互而逐步更新,利用在线强化学习提升其解决目标任务的性能。我们采用专为研究高阶功能落地设计的交互式文本环境,结合一组空间与导航任务,探讨以下科学问题:1) LLM能否提升各类强化学习任务的在线学习样本效率?2) 其如何促进不同形式的泛化能力?3) 在线学习的影响是什么?我们通过对FLAN-T5的多种变体(不同规模、架构)进行功能落地的实验研究这些问题。