Few multi-agent reinforcement learning (MARL) research on Google Research Football (GRF) focus on the 11v11 multi-agent full-game scenario and to the best of our knowledge, no open benchmark on this scenario has been released to the public. In this work, we fill the gap by providing a population-based MARL training pipeline and hyperparameter settings on multi-agent football scenario that outperforms the bot with difficulty 1.0 from scratch within 2 million steps. Our experiments serve as a reference for the expected performance of Independent Proximal Policy Optimization (IPPO), a state-of-the-art multi-agent reinforcement learning algorithm where each agent tries to maximize its own policy independently across various training configurations. Meanwhile, we open-source our training framework Light-MALib which extends the MALib codebase by distributed and asynchronized implementation with additional analytical tools for football games. Finally, we provide guidance for building strong football AI with population-based training and release diverse pretrained policies for benchmarking. The goal is to provide the community with a head start for whoever experiment their works on GRF and a simple-to-use population-based training framework for further improving their agents through self-play. The implementation is available at https://github.com/Shanghai-Digital-Brain-Laboratory/DB-Football.
翻译:尽管已有少量关于Google Research Football (GRF)的多智能体强化学习(MARL)研究聚焦于11v11多智能体全场比赛场景,但据我们所知,目前尚未有面向该场景的公开基准测试发布。在本工作中,我们通过提供一个基于种群的多智能体强化学习训练流程及多智能体足球场景下的超参数设置填补了这一空白——该方案可在200万步内从零开始超越难度为1.0的机器人。我们的实验为独立近端策略优化算法(IPPO)这一前沿的多智能体强化学习方法提供了性能基准参考,该算法中每个智能体在不同训练配置下独立最大化自身策略。同时,我们开源了训练框架Light-MALib,该框架通过分布式异步实现扩展了MALib代码库,并针对足球比赛增加了额外分析工具。最后,我们为构建基于种群训练的强效足球AI提供了指导,并发布了多种预训练策略供基准测试使用。本工作的目标是:为在GRF上进行实验的研究者提供先发优势,并通过简洁易用的基于种群训练框架,助其通过自我对抗进一步提升智能体性能。相关实现代码已开源至https://github.com/Shanghai-Digital-Brain-Laboratory/DB-Football。