Open-source reinforcement learning (RL) environments have played a crucial role in driving progress in the development of AI algorithms. In modern RL research, there is a need for simulated environments that are performant, scalable, and modular to enable their utilization in a wider range of potential real-world applications. Therefore, we present Jumanji, a suite of diverse RL environments specifically designed to be fast, flexible, and scalable. Jumanji provides a suite of environments focusing on combinatorial problems frequently encountered in industry, as well as challenging general decision-making tasks. By leveraging the efficiency of JAX and hardware accelerators like GPUs and TPUs, Jumanji enables rapid iteration of research ideas and large-scale experimentation, ultimately empowering more capable agents. Unlike existing RL environment suites, Jumanji is highly customizable, allowing users to tailor the initial state distribution and problem complexity to their needs. Furthermore, we provide actor-critic baselines for each environment, accompanied by preliminary findings on scaling and generalization scenarios. Jumanji aims to set a new standard for speed, adaptability, and scalability of RL environments.
翻译:开源强化学习环境在推动人工智能算法发展进程中发挥了关键作用。现代强化学习研究需要具备高性能、可扩展性和模块化特性的仿真环境,以拓展其在更广泛实际应用场景中的潜力。为此,我们提出Jumanji——一套专为高速、灵活和可扩展性设计的多样化强化学习环境套件。该套件聚焦于工业界常见的组合优化问题及具有挑战性的通用决策任务。通过充分利用JAX的计算效率和GPU、TPU等硬件加速器,Jumanji能够实现研究思路的快速迭代和大规模实验,最终赋能更智能的智能体。与现有强化学习环境套件不同,Jumanji具有高度可定制性,允许用户根据需求调整初始状态分布和问题复杂度。此外,我们还为每个环境提供了演员-评论家基线算法,并附带了关于扩展性和泛化场景的初步研究成果。Jumanji旨在为强化学习环境的速度、适应性和可扩展性树立新标杆。