Multi-agent football poses an unsolved challenge in AI research. Existing work has focused on tackling simplified scenarios of the game, or else leveraging expert demonstrations. In this paper, we develop a multi-agent system to play the full 11 vs. 11 game mode, without demonstrations. This game mode contains aspects that present major challenges to modern reinforcement learning algorithms; multi-agent coordination, long-term planning, and non-transitivity. To address these challenges, we present TiZero; a self-evolving, multi-agent system that learns from scratch. TiZero introduces several innovations, including adaptive curriculum learning, a novel self-play strategy, and an objective that optimizes the policies of multiple agents jointly. Experimentally, it outperforms previous systems by a large margin on the Google Research Football environment, increasing win rates by over 30%. To demonstrate the generality of TiZero's innovations, they are assessed on several environments beyond football; Overcooked, Multi-agent Particle-Environment, Tic-Tac-Toe and Connect-Four.
翻译:多智能体足球是人工智能研究中一个尚未解决的挑战。现有工作主要聚焦于处理游戏的简化场景,或利用专家演示。在本文中,我们开发了一个无需演示即可进行完整11对11比赛模式的多智能体系统。该比赛模式包含对现代强化学习算法构成重大挑战的方面:多智能体协作、长期规划以及非传递性。为应对这些挑战,我们提出了TiZero——一个从零开始学习的自演化多智能体系统。TiZero引入了多项创新,包括自适应课程学习、一种新颖的自我对弈策略,以及一个联合优化多个智能体策略的目标函数。实验结果表明,在谷歌研究足球环境中,它以超过30%的胜率提升大幅超越了之前的系统。为展示TiZero创新的通用性,我们在足球之外的多个环境中对其进行了评估:Overcooked、多智能体粒子环境、井字棋和四子棋。