Building a single generalist agent with zero-shot capability has recently sparked significant advancements in decision-making. However, extending this capability to multi-agent scenarios presents challenges. Most current works struggle with zero-shot capabilities, due to two challenges particular to the multi-agent settings: a mismatch between centralized pretraining and decentralized execution, and varying agent numbers and action spaces, making it difficult to create generalizable representations across diverse downstream tasks. To overcome these challenges, we propose a \textbf{Mask}ed pretraining framework for \textbf{M}ulti-\textbf{a}gent decision making (MaskMA). This model, based on transformer architecture, employs a mask-based collaborative learning strategy suited for decentralized execution with partial observation. Moreover, MaskMA integrates a generalizable action representation by dividing the action space into actions toward self-information and actions related to other entities. This flexibility allows MaskMA to tackle tasks with varying agent numbers and thus different action spaces. Extensive experiments in SMAC reveal MaskMA, with a single model pretrained on 11 training maps, can achieve an impressive 77.8% zero-shot win rate on 60 unseen test maps by decentralized execution, while also performing effectively on other types of downstream tasks (\textit{e.g.,} varied policies collaboration and ad hoc team play).
翻译:构建一个具备零样本能力的通用智能体近期在决策领域取得了显著进展。然而,将这一能力扩展到多智能体场景仍面临挑战。大多数现有方法在零样本能力上表现欠佳,原因在于多智能体设置特有的两大难点:集中式预训练与分散式执行之间的不匹配,以及智能体数量与动作空间的变化,使得难以在多种下游任务中创建通用表征。为解决这些问题,我们提出了面向多智能体决策的掩码预训练框架(MaskMA)。该模型基于Transformer架构,采用掩码协作学习策略,适用于部分观测下的分散式执行。此外,MaskMA通过将动作空间划分为面向自身信息的动作和与其他实体相关的动作,实现了通用的动作表征。这种灵活性使得MaskMA能够处理智能体数量不同、进而动作空间各异的任务。在SMAC上的大量实验表明,MaskMA仅使用单个模型在11张训练地图上预训练,即可通过分散式执行在60张未见测试地图上实现77.8%的零样本胜率,同时在其他类型下游任务(例如,不同策略协作与临时团队配合)中也能高效运行。