Benchmarks are crucial in the development of machine learning algorithms, with available environments significantly influencing reinforcement learning (RL) research. Traditionally, RL environments run on the CPU, which limits their scalability with typical academic compute. However, recent advancements in JAX have enabled the wider use of hardware acceleration, enabling massively parallel RL training pipelines and environments. While this has been successfully applied to single-agent RL, it has not yet been widely adopted for multi-agent scenarios. In this paper, we present JaxMARL, the first open-source, Python-based library that combines GPU-enabled efficiency with support for a large number of commonly used MARL environments and popular baseline algorithms. Our experiments show that, in terms of wall clock time, our JAX-based training pipeline is around 14 times faster than existing approaches, and up to 12500x when multiple training runs are vectorized. This enables efficient and thorough evaluations, potentially alleviating the evaluation crisis in the field. We also introduce and benchmark SMAX, a JAX-based approximate reimplementation of the popular StarCraft Multi-Agent Challenge, which removes the need to run the StarCraft II game engine. This not only enables GPU acceleration, but also provides a more flexible MARL environment, unlocking the potential for self-play, meta-learning, and other future applications in MARL. The code is available at https://github.com/flairox/jaxmarl.
翻译:基准测试在机器学习算法的发展中至关重要,可用环境显著影响着强化学习(RL)研究。传统上,RL环境运行在CPU上,这限制了其在典型学术计算资源下的可扩展性。然而,JAX的最新进展使得硬件加速得到更广泛的应用,从而实现了大规模并行的RL训练流程与环境。尽管这一技术已成功应用于单智能体RL,但尚未在多智能体场景中得到广泛采用。本文提出了JaxMARL,这是首个开源的、基于Python的库,它结合了GPU加速的高效性,并支持大量常用的多智能体强化学习(MARL)环境和流行的基线算法。我们的实验表明,在时钟时间方面,我们基于JAX的训练流程比现有方法快约14倍,而在向量化多个训练运行时,速度提升可达12500倍。这使得高效且全面的评估成为可能,有望缓解该领域的评估危机。我们还介绍并基准测试了SMAX,这是一个基于JAX的、对流行的星际争霸多智能体挑战(StarCraft Multi-Agent Challenge)的近似重实现,它消除了运行星际争霸II游戏引擎的需求。这不仅实现了GPU加速,还提供了一个更灵活的MARL环境,为自对弈、元学习以及MARL中其他未来应用解锁了潜力。代码可在https://github.com/flairox/jaxmarl获取。