Multi-Agent Reinforcement Learning with Selective State-Space Models

The Transformer model has demonstrated success across a wide range of domains, including in Multi-Agent Reinforcement Learning (MARL) where the Multi-Agent Transformer (MAT) has emerged as a leading algorithm in the field. The Transformer model has demonstrated success across a wide range of domains, including in Multi-Agent Reinforcement Learning (MARL) where the Multi-Agent Transformer (MAT) has emerged as a leading algorithm in the field. However, a significant drawback of Transformer models is their quadratic computational complexity relative to input size, making them computationally expensive when scaling to larger inputs. This limitation restricts MAT's scalability in environments with many agents. Recently, State-Space Models (SSMs) have gained attention due to their computational efficiency, but their application in MARL remains unexplored. In this work, we investigate the use of Mamba, a recent SSM, in MARL and assess whether it can match the performance of MAT while providing significant improvements in efficiency. We introduce a modified version of MAT that incorporates standard and bi-directional Mamba blocks, as well as a novel "cross-attention" Mamba block. Extensive testing shows that our Multi-Agent Mamba (MAM) matches the performance of MAT across multiple standard multi-agent environments, while offering superior scalability to larger agent scenarios. This is significant for the MARL community, because it indicates that SSMs could replace Transformers without compromising performance, whilst also supporting more effective scaling to higher numbers of agents. Our project page is available at https://sites.google.com/view/multi-agent-mamba .

翻译：Transformer模型已在包括多智能体强化学习在内的广泛领域中取得成功，其中多智能体Transformer已成为该领域的领先算法。然而，Transformer模型的一个显著缺点是其相对于输入大小的二次计算复杂度，导致在处理大规模输入时计算成本高昂。这一局限限制了MAT在包含大量智能体的环境中的可扩展性。近年来，状态空间模型因其计算效率受到关注，但其在多智能体强化学习中的应用尚未被探索。本研究探讨了将新型SSM模型Mamba应用于多智能体强化学习的可行性，并评估其能否在保持MAT性能的同时显著提升效率。我们提出了MAT的改进版本，该版本融合了标准与双向Mamba模块，以及一种新颖的"交叉注意力"Mamba模块。大量实验表明，我们提出的多智能体Mamba模型在多个标准多智能体环境中达到了与MAT相当的性能，同时在更大规模智能体场景中展现出更优的可扩展性。这对多智能体强化学习领域具有重要意义，因为它表明状态空间模型可以在不牺牲性能的前提下替代Transformer，并支持更高效的大规模智能体扩展。我们的项目页面位于 https://sites.google.com/view/multi-agent-mamba。