We propose Pgx, a suite of board game reinforcement learning (RL) environments written in JAX and optimized for GPU/TPU accelerators. By leveraging JAX's auto-vectorization and parallelization over accelerators, Pgx can efficiently scale to thousands of simultaneous simulations over accelerators. In our experiments on a DGX-A100 workstation, we discovered that Pgx can simulate RL environments 10-100x faster than existing implementations available in Python. Pgx includes RL environments commonly used as benchmarks in RL research, such as backgammon, chess, shogi, and Go. Additionally, Pgx offers miniature game sets and baseline models to facilitate rapid research cycles. We demonstrate the efficient training of the Gumbel AlphaZero algorithm with Pgx environments. Overall, Pgx provides high-performance environment simulators for researchers to accelerate their RL experiments. Pgx is available at http://github.com/sotetsuk/pgx.
翻译:我们提出Pgx,一套基于JAX编写并针对GPU/TPU加速器优化的棋盘游戏强化学习环境套件。通过利用JAX的自动向量化与加速器并行化能力,Pgx能够在加速器上高效扩展至数千个并发模拟。在DGX-A100工作站上的实验表明,Pgx模拟强化学习环境的速度比现有Python实现快10-100倍。Pgx包含强化学习研究中常用的基准测试环境,如西洋双陆棋、国际象棋、将棋和围棋。此外,Pgx提供迷你游戏集和基线模型以促进快速研究迭代。我们展示了在Pgx环境中高效训练Gumbel AlphaZero算法的能力。总体而言,Pgx为研究人员提供高性能环境模拟器以加速其强化学习实验。Pgx代码已开源至http://github.com/sotetsuk/pgx。