While reinforcement learning (RL) shows remarkable success in decision-making problems, it often requires a lot of interactions with the environment, and in sparse-reward environments, it is challenging to learn meaningful policies. Large Language Models (LLMs) can potentially provide valuable guidance to agents in learning policies, thereby enhancing the performance of RL algorithms in such environments. However, LLMs often encounter difficulties in understanding downstream tasks, which hinders their ability to optimally assist agents in these tasks. A common approach to mitigating this issue is to fine-tune the LLMs with task-related data, enabling them to offer useful guidance for RL agents. However, this approach encounters several difficulties, such as inaccessible model weights or the need for significant computational resources, making it impractical. In this work, we introduce RLAdapter, a framework that builds a better connection between RL algorithms and LLMs by incorporating an adapter model. Within the RLAdapter framework, fine-tuning a lightweight language model with information generated during the training process of RL agents significantly aids LLMs in adapting to downstream tasks, thereby providing better guidance for RL agents. We conducted experiments to evaluate RLAdapter in the Crafter environment, and the results show that RLAdapter surpasses the SOTA baselines. Furthermore, agents under our framework exhibit common-sense behaviors that are absent in baseline models.
翻译:尽管强化学习在决策问题中展现出显著成功,但其往往需要与环境进行大量交互,且在稀疏奖励环境中难以学习到有意义的策略。大型语言模型(LLMs)有望为智能体的策略学习提供宝贵指导,从而增强强化学习算法在此类环境中的性能。然而,LLMs常面临理解下游任务的困难,这限制了其最优协助智能体的能力。缓解该问题的常见方法是通过任务相关数据微调LLMs,使其能为强化学习智能体提供有效指导。但此方法存在模型权重不可获取或计算资源需求过大等实际难题。本文提出RLAdapter框架,通过引入适配器模型在强化学习算法与LLMs之间建立更优连接。在RLAdapter框架内,利用强化学习智能体训练过程中生成的信息微调轻量级语言模型,可显著辅助LLMs适应下游任务,从而为智能体提供更优指导。我们基于Crafter环境开展实验评估,结果表明RLAdapter超越当前最优基线方法。此外,本框架下训练的智能体展现出基线模型所不具备的常识性行为特征。