Real-world sequential decision making is characterized by sparse rewards and large decision spaces, posing significant difficulty for experiential learning systems like $\textit{tabula rasa}$ reinforcement learning (RL) agents. Large Language Models (LLMs), with a wealth of world knowledge, can help RL agents learn quickly and adapt to distribution shifts. In this work, we introduce Language Guided Exploration (LGE) framework, which uses a pre-trained language model (called GUIDE ) to provide decision-level guidance to an RL agent (called EXPLORER). We observe that on ScienceWorld (Wang et al.,2022), a challenging text environment, LGE outperforms vanilla RL agents significantly and also outperforms other sophisticated methods like Behaviour Cloning and Text Decision Transformer.
翻译:现实世界中的序贯决策具有稀疏奖励和大决策空间的特点,这对$\textit{白板}$强化学习(RL)代理等经验学习系统构成了巨大挑战。具备丰富世界知识的大型语言模型(LLM)可以帮助强化学习代理快速学习并适应分布偏移。本文提出语言引导探索(LGE)框架,该框架利用预训练语言模型(称为GUIDE)为强化学习代理(称为EXPLORER)提供决策层面的指导。我们观察到,在具有挑战性的文本环境ScienceWorld(Wang等人,2022)中,LGE显著优于基础强化学习代理,同时也优于行为克隆和文本决策变换器等先进方法。