Multi-agent reinforcement learning (MARL) research is inherently computationally expensive and it is often difficult to obtain a sufficient number of experiment samples to test hypotheses and make robust statistical claims. Furthermore, MARL algorithms are typically complex in their design and can be tricky to implement correctly. These aspects of MARL present a difficult challenge when it comes to creating useful software for advanced research. Our criteria for such software is that it should be simple enough to use to implement new ideas quickly, while at the same time be scalable and fast enough to test those ideas in a reasonable amount of time. In this preliminary technical report, we introduce Mava, a research library for MARL written purely in JAX, that aims to fulfill these criteria. We discuss the design and core features of Mava, and demonstrate its use and performance across a variety of environments. In particular, we show Mava's substantial speed advantage, with improvements of 10-100x compared to other popular MARL frameworks, while maintaining strong performance. This allows for researchers to test ideas in a few minutes instead of several hours. Finally, Mava forms part of an ecosystem of libraries that seamlessly integrate with each other to help facilitate advanced research in MARL. We hope Mava will benefit the community and help drive scientifically sound and statistically robust research in the field. The open-source repository for Mava is available at https://github.com/instadeepai/Mava.
翻译:多智能体强化学习(MARL)研究本质上是计算密集型任务,且往往难以获取足够的实验样本来验证假设并做出可靠的统计学结论。此外,MARL算法通常设计复杂且易于在实现过程中出错。这些特性为开发面向高级研究的实用软件带来了严峻挑战。我们对此类软件的要求是:既需足够简洁以便快速实现新想法,又要具备可扩展性和高效性,从而能在合理时间内完成对想法的验证。本初步技术报告中,我们介绍了一个完全基于JAX编写的MARL研究库——Mava,旨在满足上述标准。我们讨论了Mava的设计理念与核心特性,并展示了其在多种环境中的应用与性能表现。特别地,与其它主流MARL框架相比,Mava在保持强大性能的同时实现了10-100倍的速度提升,这使得研究者能在数分钟内(而非数小时)完成想法验证。最后,Mava作为可无缝集成的库生态系统组成部分,有助于推动MARL领域的深度研究。我们期望Mava能造福社区,促进该领域开展科学严谨、统计学稳健的研究。Mava的开源代码仓库位于https://github.com/instadeepai/Mava。