StarCraft Multi-Agent Challenge (SMAC) is one of the most commonly used experimental environments in multi-agent reinforcement learning (MARL), where the specific task is to control a set number of allied units to defeat enemy forces. Traditional MARL algorithms often require interacting with the environment for up to 1 million steps to train a model, and the resulting policies are typically non-interpretable with weak transferability. In this paper, we propose a novel approach to solving SMAC tasks called LLM-SMAC. In our framework, agents leverage large language models (LLMs) to generate decision tree code by providing task descriptions. The model is further self-reflection using feedback from the rewards provided by the environment. We conduct experiments in the SMAC and demonstrate that our method can produce high-quality, interpretable decision trees with minimal environmental exploration. Moreover, these models exhibit strong transferability, successfully applying to similar SMAC environments without modification. We believe this approach offers a new direction for solving decision-making tasks in the future.
翻译:星际争霸多智能体挑战(SMAC)是多智能体强化学习(MARL)领域最常用的实验环境之一,其具体任务为控制一定数量的友方单位击败敌方部队。传统的MARL算法通常需要与环境进行多达100万步的交互来训练模型,且所得策略通常难以解释、迁移能力较弱。本文提出了一种解决SMAC任务的新方法LLM-SMAC。在我们的框架中,智能体通过提供任务描述,利用大语言模型(LLMs)生成决策树代码,并进一步借助环境提供的奖励反馈进行自我反思。我们在SMAC环境中进行了实验,结果表明我们的方法能够以极少的探索生成高质量、可解释的决策树。此外,这些模型展现出强大的迁移能力,无需修改即可成功应用于相似的SMAC环境。我们相信该方法为未来解决决策任务提供了新的方向。