We propose Pgx, a suite of board game reinforcement learning (RL) environments written in JAX and optimized for GPU/TPU accelerators. By leveraging auto-vectorization and Just-In-Time (JIT) compilation of JAX, Pgx can efficiently scale to thousands of parallel executions over accelerators. In our experiments on a DGX-A100 workstation, we discovered that Pgx can simulate RL environments 10-100x faster than existing Python RL libraries. Pgx includes RL environments commonly used as benchmarks in RL research, such as backgammon, chess, shogi, and Go. Additionally, Pgx offers miniature game sets and baseline models to facilitate rapid research cycles. We demonstrate the efficient training of the Gumbel AlphaZero algorithm with Pgx environments. Overall, Pgx provides high-performance environment simulators for researchers to accelerate their RL experiments. Pgx is available at https://github.com/sotetsuk/pgx.
翻译:摘要:我们提出Pgx,这是一套基于JAX构建的棋盘游戏强化学习环境套件,针对GPU/TPU加速器进行了优化。通过利用JAX的自动向量化与即时编译技术,Pgx能够在加速器上高效扩展至数千个并行执行实例。在DGX-A100工作站的实验中,我们发现Pgx模拟强化学习环境的速度比现有Python强化学习库快10-100倍。Pgx包含强化学习研究中常用的基准环境,如西洋双陆棋、国际象棋、将棋和围棋。此外,Pgx还提供微型游戏集和基线模型,以促进快速研究周期。我们展示了在Pgx环境中高效训练Gumbel AlphaZero算法的能力。总体而言,Pgx为研究人员提供了高性能环境模拟器,以加速其强化学习实验。Pgx可在https://github.com/sotetsuk/pgx获取。