As multi-agent reinforcement learning (MARL) progresses towards solving larger and more complex problems, it becomes increasingly important that algorithms exhibit the key properties of (1) strong performance, (2) memory efficiency and (3) scalability. In this work, we introduce Sable, a performant, memory efficient and scalable sequence modeling approach to MARL. Sable works by adapting the retention mechanism in Retentive Networks (Sun et al., 2023) to achieve computationally efficient processing of multi-agent observations with long context memory for temporal reasoning. Through extensive evaluations across six diverse environments, we demonstrate how Sable is able to significantly outperform existing state-of-the-art methods in a large number of diverse tasks (34 out of 45 tested). Furthermore, Sable maintains performance as we scale the number of agents, handling environments with more than a thousand agents while exhibiting a linear increase in memory usage. Finally, we conduct ablation studies to isolate the source of Sable's performance gains and confirm its efficient computational memory usage.
翻译:随着多智能体强化学习(MARL)朝着解决更大、更复杂问题的方向发展,算法展现出(1)强大性能、(2)内存效率与(3)可扩展性这三个关键属性变得日益重要。本文中,我们提出了Sable,一种面向MARL的高性能、内存高效且可扩展的序列建模方法。Sable通过适配Retentive Networks(Sun等人,2023)中的保持机制,实现了对多智能体观测数据的高效计算处理,并具备用于时序推理的长上下文记忆能力。通过在六个不同环境中的广泛评估,我们证明了Sable能够在大量多样化任务中(45个测试任务中的34个)显著超越现有的最先进方法。此外,随着智能体数量的增加,Sable能够保持性能,可处理包含超过一千个智能体的环境,同时内存使用量仅呈线性增长。最后,我们通过消融实验分离了Sable性能提升的来源,并验证了其高效的计算内存使用。