Recent works successfully leveraged Large Language Models' (LLM) abilities to capture abstract knowledge about world's physics to solve decision-making problems. Yet, the alignment between LLMs' knowledge and the environment can be wrong and limit functional competence due to lack of grounding. In this paper, we study an approach (named GLAM) to achieve this alignment through functional grounding: we consider an agent using an LLM as a policy that is progressively updated as the agent interacts with the environment, leveraging online Reinforcement Learning to improve its performance to solve goals. Using an interactive textual environment designed to study higher-level forms of functional grounding, and a set of spatial and navigation tasks, we study several scientific questions: 1) Can LLMs boost sample efficiency for online learning of various RL tasks? 2) How can it boost different forms of generalization? 3) What is the impact of online learning? We study these questions by functionally grounding several variants (size, architecture) of FLAN-T5.
翻译:近期研究成功利用大型语言模型(LLM)捕捉世界物理规律抽象知识的能力来解决决策问题。然而,由于缺乏具身化基础,LLM知识与实际环境之间可能存在偏差,从而限制其功能实现。本文研究了一种名为GLAM的方法,通过功能具身化实现这种对齐:我们将使用LLM作为策略的智能体置于交互环境中,随着智能体与环境持续互动,利用在线强化学习提升其目标解决能力,从而实现策略的渐进式更新。通过采用专为研究高阶功能具身化设计的交互式文本环境,以及一系列空间与导航任务,我们探讨了以下科学问题:1)LLM能否提升各类强化学习任务的在线学习样本效率?2)如何促进不同形式的泛化能力?3)在线学习会产生何种影响?我们通过对FLAN-T5多种变体(规模、架构)进行功能具身化来研究这些问题。