Large Language Models (LLMs) have demonstrated remarkable capabilities for reinforcement learning (RL) models, such as planning and reasoning capabilities. However, the problems of LLMs and RL model collaboration still need to be solved. In this study, we employ a teacher-student learning framework to tackle these problems, specifically by offering feedback for LLMs using RL models and providing high-level information for RL models with LLMs in a cooperative multi-agent setting. Within this framework, the LLM acts as a teacher, while the RL model acts as a student. The two agents cooperatively assist each other through a process of recursive help, such as "I help you help I help." The LLM agent supplies abstract information to the RL agent, enabling efficient exploration and policy improvement. In turn, the RL agent offers feedback to the LLM agent, providing valuable, real-time information that helps generate more useful tokens. This bi-directional feedback loop promotes optimization, exploration, and mutual improvement for both agents, enabling them to accomplish increasingly challenging tasks. Remarkably, we propose a practical algorithm to address the problem and conduct empirical experiments to evaluate the effectiveness of our method.
翻译:大语言模型(LLMs)已展现出为强化学习(RL)模型提供规划与推理等能力的显著潜力。然而,LLMs与RL模型的协作问题仍有待解决。本研究采用师生学习框架应对这些挑战,具体而言,在合作多智能体环境中,利用RL模型为LLMs提供反馈,同时由LLMs为RL模型提供高层信息。在该框架下,LLM作为教师,RL模型作为学生。两个智能体通过递归式互助过程(如"我助你助我助")协同协作。LLM智能体向RL智能体提供抽象信息,使其能够高效探索并改进策略;RL智能体则向LLM智能体提供反馈,生成有价值的实时信息以帮助生成更有效的词汇。这种双向反馈循环促进了两个智能体的优化、探索与相互提升,使其能够完成日益复杂的任务。值得注意的是,我们提出了一个实用算法来解决该问题,并通过实证实验评估了方法的有效性。