Large language models (LLMs) encode a vast amount of world knowledge acquired from massive text datasets. Recent studies have demonstrated that LLMs can assist an algorithm agent in solving complex sequential decision making tasks in embodied environments by providing high-level instructions. However, interacting with LLMs can be time-consuming, as in many practical scenarios, they require a significant amount of storage space that can only be deployed on remote cloud server nodes. Additionally, using commercial LLMs can be costly since they may charge based on usage frequency. In this paper, we explore how to enable efficient and cost-effective interactions between the agent and an LLM. We propose a reinforcement learning based mediator model that determines when it is necessary to consult LLMs for high-level instructions to accomplish a target task. Experiments on 4 MiniGrid environments that entail planning sub-goals demonstrate that our method can learn to solve target tasks with only a few necessary interactions with an LLM, significantly reducing interaction costs in testing environments, compared with baseline methods. Experimental results also suggest that by learning a mediator model to interact with the LLM, the agent's performance becomes more robust against both exploratory and stochastic environments.
翻译:大型语言模型(LLM)从海量文本数据中编码了丰富的世界知识。近年研究表明,LLM可通过提供高层指令,辅助算法代理在具身环境中完成复杂的序列决策任务。然而,在实际场景中,与LLM交互往往耗时较长——由于模型需要大量存储空间,通常只能部署在远程云服务器节点上。此外,商业LLM按使用频率计费的特点使得交互成本高昂。本文旨在探索如何实现代理与LLM之间高效且经济的交互过程。我们提出一种基于强化学习的调解模型,该模型可自主判断何时需要向LLM咨询高层指令以完成目标任务。在4个需要规划子目标的MiniGrid环境中的实验表明,与基线方法相比,本方法仅需与LLM进行少量关键交互即可习得目标任务解决方案,显著降低了测试环境中的交互成本。实验结果同时表明,通过学习调解模型与LLM进行交互,代理在探索性和随机性环境中的性能表现均具有更强鲁棒性。