Large language models (LLMs) encode a vast amount of world knowledge acquired from massive text datasets. Recent studies have demonstrated that LLMs can assist an agent in solving complex sequential decision making tasks in embodied environments by providing high-level instructions. However, interacting with LLMs can be time-consuming, as in many practical scenarios, they require a significant amount of storage space that can only be deployed on remote cloud server nodes. Additionally, using commercial LLMs can be costly since they may charge based on usage frequency. In this paper, we explore how to enable intelligent cost-effective interactions between the agent and an LLM. We propose a reinforcement learning based mediator model that determines when it is necessary to consult LLMs for high-level instructions to accomplish a target task. Experiments on 4 MiniGrid environments that entail planning sub-goals demonstrate that our method can learn to solve target tasks with only a few necessary interactions with an LLM, significantly reducing interaction costs in testing environments, compared with baseline methods. Experimental results also suggest that by learning a mediator model to interact with the LLM, the agent's performance becomes more robust against partial observability of the environment. Our Code is available at https://github.com/ZJLAB-AMMI/LLM4RL.
翻译:大型语言模型(LLMs)通过海量文本数据集编码了丰富的世界知识。近期研究表明,LLMs可通过提供高层指令,辅助智能体在具身环境中解决复杂的序列决策任务。然而,与LLMs的交互可能耗时费力——在许多实际场景中,它们需要大量的存储空间,只能部署在远程云端服务器节点上。此外,使用商业LLMs可能成本高昂,因其常依据使用频率收费。本文探索如何使智能体与大语言模型之间实现智能、经济的交互。我们提出一种基于强化学习的中介模型,该模型能自主判断何时需要向LLMs咨询高层指令以完成目标任务。在4个需要规划子目标的MiniGrid环境中的实验表明,与基线方法相比,我们的方法只需与LLM进行少量必要交互即可学会解决目标任务,在测试环境中显著降低了交互成本。实验结果还表明,通过学习中介模型与LLM交互,智能体的性能对环境的局部可观测性具有更强的鲁棒性。我们的代码开源于https://github.com/ZJLAB-AMMI/LLM4RL。