Recent studies have shown that Large Language Models (LLMs) can be utilized for solving complex sequential decision-making tasks by providing high-level instructions. However, LLM-based agents face limitations in real-time dynamic environments due to their lack of specialization in solving specific target problems. Moreover, the deployment of such LLM-based agents is both costly and time-consuming in practical scenarios. In this paper, we introduce a novel framework that addresses these challenges by training a smaller scale specialized student agent using instructions from an LLM-based teacher agent. By leveraging guided actions provided by the teachers, the prior knowledge of the LLM is distilled into the local student model. Consequently, the student agent can be trained with significantly less data. Furthermore, subsequent training with environment feedback empowers the student agents to surpass the capabilities of their teachers. We conducted experiments on three challenging MiniGrid environments to evaluate the effectiveness of our framework. The results demonstrate that our approach enhances sample efficiency and achieves superior performance compared to baseline methods.
翻译:近期研究表明,大型语言模型可通过提供高层指令来解决复杂的序列决策任务。然而,基于大语言模型的代理在实时动态环境中存在局限性,因其缺乏对特定目标问题的专门化处理能力。此外,此类大语言模型代理在实际部署中既昂贵又耗时。本文提出一种创新框架,通过训练小规模专用学生代理并借助大语言模型教师代理的指令来解决上述挑战。通过利用教师提供的引导动作,将大语言模型的先验知识蒸馏至本地学生模型。由此,学生代理可用显著更少的数据进行训练。进一步地,基于环境反馈的后续训练使学生代理能够超越其教师的能力。我们在三个具有挑战性的MiniGrid环境中开展了实验以评估该框架的有效性。结果表明,与基线方法相比,我们的方法提升了样本效率并实现了更优性能。