Recent studies have uncovered the potential of Large Language Models (LLMs) in addressing complex sequential decision-making tasks through the provision of high-level instructions. However, LLM-based agents lack specialization in tackling specific target problems, particularly in real-time dynamic environments. Additionally, deploying an LLM-based agent in practical scenarios can be both costly and time-consuming. On the other hand, reinforcement learning (RL) approaches train agents that specialize in the target task but often suffer from low sampling efficiency and high exploration costs. In this paper, we introduce a novel framework that addresses these challenges by training a smaller, specialized student RL agent using instructions from an LLM-based teacher agent. By incorporating the guidance from the teacher agent, the student agent can distill the prior knowledge of the LLM into its own model. Consequently, the student agent can be trained with significantly less data. Moreover, through further training with environment feedback, the student agent surpasses the capabilities of its teacher for completing the target task. We conducted experiments on challenging MiniGrid and Habitat environments, specifically designed for embodied AI research, to evaluate the effectiveness of our framework. The results clearly demonstrate that our approach achieves superior performance compared to strong baseline methods. Our code is available at https://github.com/ZJLAB-AMMI/LLM4Teach.
翻译:近日研究表明,大语言模型在通过提供高级指令解决复杂序列决策任务方面展现出潜力。然而,基于大语言模型的智能体在处理特定目标问题(尤其在实时动态环境中)时缺乏专精性,且在实际场景部署此类智能体通常成本高昂并耗时。另一方面,强化学习方法虽能训练出专精于目标任务的智能体,但常面临采样效率低与探索成本高的困境。本文提出一种创新框架,通过利用基于大语言模型的教师智能体所生成的指令,训练规模较小的专精学生强化学习智能体,从而应对上述挑战。通过整合教师智能体的指导,学生智能体可将大语言模型的先验知识蒸馏至自身模型中,使训练所需数据量显著减少。此外,学生智能体在接收环境反馈进行进一步训练后,其在完成目标任务方面的能力可超越教师智能体。我们在面向具身智能研究的挑战性MiniGrid与Habitat环境中开展实验,以评估该框架的有效性。结果明确表明,与强基线方法相比,我们的方法实现了更优性能。相关代码参见https://github.com/ZJLAB-AMMI/LLM4Teach。