Recent studies have shown that Large Language Models (LLMs) can be utilized for solving complex sequential decision-making tasks by providing high-level instructions. However, LLM-based agents face limitations in real-time dynamic environments due to their lack of specialization in solving specific target problems. Moreover, the deployment of such LLM-based agents is both costly and time-consuming in practical scenarios. In this paper, we introduce a novel framework that addresses these challenges by training a smaller scale specialized student agent using instructions from an LLM-based teacher agent. By leveraging guided actions provided by the teachers, the prior knowledge of the LLM is distilled into the local student model. Consequently, the student agent can be trained with significantly less data. Furthermore, subsequent training with environment feedback empowers the student agents to surpass the capabilities of their teachers. We conducted experiments on three challenging MiniGrid environments to evaluate the effectiveness of our framework. The results demonstrate that our approach enhances sample efficiency and achieves superior performance compared to baseline methods. Our code is available at https://github.com/ZJLAB-AMMI/LLM4Teach.
翻译:近期研究表明,大型语言模型能够通过提供高层指令解决复杂的序列决策任务。然而,基于LLM的智能体在实时动态环境中存在局限性,原因在于其缺乏解决特定目标问题的专精能力。此外,在实际场景中部署此类LLM智能体既昂贵又耗时。本文提出一种新颖框架,通过利用基于LLM的教师智能体提供的指令,训练规模更小的专业学生智能体来应对上述挑战。借助教师提供的引导动作,LLM的先验知识被蒸馏至本地学生模型中。由此,学生智能体可用显著更少的数据完成训练。进一步结合环境反馈的训练过程,使学生智能体能够超越其教师的能力。我们在三个具有挑战性的MiniGrid环境中开展实验,以评估该框架的有效性。结果表明,相较于基线方法,我们的方法提升了样本效率并取得了更优性能。我们的代码已开源至https://github.com/ZJLAB-AMMI/LLM4Teach。