Recent studies have uncovered the potential of Large Language Models (LLMs) in addressing complex sequential decision-making tasks through the provision of high-level instructions. However, LLM-based agents lack specialization in tackling specific target problems, particularly in real-time dynamic environments. Additionally, deploying an LLM-based agent in practical scenarios can be both costly and time-consuming. On the other hand, reinforcement learning (RL) approaches train agents that specialize in the target task but often suffer from low sampling efficiency and high exploration costs. In this paper, we introduce a novel framework that addresses these challenges by training a smaller, specialized student RL agent using instructions from an LLM-based teacher agent. By incorporating the guidance from the teacher agent, the student agent can distill the prior knowledge of the LLM into its own model. Consequently, the student agent can be trained with significantly less data. Moreover, through further training with environment feedback, the student agent surpasses the capabilities of its teacher for completing the target task. We conducted experiments on challenging MiniGrid and Habitat environments, specifically designed for embodied AI research, to evaluate the effectiveness of our framework. The results clearly demonstrate that our approach achieves superior performance compared to strong baseline methods. Our code is available at https://github.com/ZJLAB-AMMI/LLM4Teach.
翻译:近期研究揭示了大型语言模型(LLMs)通过提供高级指令来解决复杂序列决策任务的潜力。然而,基于LLM的智能体在应对特定目标问题(尤其是实时动态环境)时缺乏专有化能力。此外,在实际场景中部署LLM智能体既成本高昂又耗时。另一方面,强化学习(RL)方法训练的智能体虽能专精于目标任务,但往往面临采样效率低和探索成本高的问题。本文提出一种新颖框架,通过利用基于LLM的教师智能体提供的指令来训练更小规模的专有化学生强化学习智能体,从而应对上述挑战。通过融入教师智能体的引导,学生智能体能够将LLM的先验知识蒸馏到自身模型中。因此,学生智能体可用显著更少的数据完成训练。此外,通过与环境反馈的进一步训练,学生智能体在完成目标任务方面的能力超越了其教师智能体。我们在专为具身AI研究设计的具有挑战性的MiniGrid和Habitat环境中进行了实验,以评估我们框架的有效性。结果明确表明,与强基线方法相比,我们的方法取得了更优性能。我们的代码发布在https://github.com/ZJLAB-AMMI/LLM4Teach。