In the rapidly advancing field of multi-agent systems, ensuring robustness in unfamiliar and adversarial settings is crucial. Notwithstanding their outstanding performance in familiar environments, these systems often falter in new situations due to overfitting during the training phase. This is especially pronounced in settings where both cooperative and competitive behaviours are present, encapsulating a dual nature of overfitting and generalisation challenges. To address this issue, we present Multi-Agent Diagnostics for Robustness via Illuminated Diversity (MADRID), a novel approach for generating diverse adversarial scenarios that expose strategic vulnerabilities in pre-trained multi-agent policies. Leveraging the concepts from open-ended learning, MADRID navigates the vast space of adversarial settings, employing a target policy's regret to gauge the vulnerabilities of these settings. We evaluate the effectiveness of MADRID on the 11vs11 version of Google Research Football, one of the most complex environments for multi-agent reinforcement learning. Specifically, we employ MADRID for generating a diverse array of adversarial settings for TiZero, the state-of-the-art approach which "masters" the game through 45 days of training on a large-scale distributed infrastructure. We expose key shortcomings in TiZero's tactical decision-making, underlining the crucial importance of rigorous evaluation in multi-agent systems.
翻译:在多智能体系统快速发展的领域中,确保其在陌生与对抗环境下的鲁棒性至关重要。尽管这些系统在熟悉环境中表现优异,但因训练阶段的过拟合,常在新情境中表现不佳。这一问题在兼具合作与竞争行为的环境中尤为突出,体现了过拟合与泛化挑战的双重特性。为解决此问题,我们提出了一种名为"通过光照多样性实现多智能体系统鲁棒性诊断"(MADRID)的新方法,用于生成多样化的对抗场景,以揭示预训练多智能体策略中的战略漏洞。MADRID借鉴开放式学习理念,在广阔的对抗场景空间中导航,并利用目标策略的遗憾值来评估这些场景的脆弱性。我们在谷歌研究足球(Google Research Football)的11vs11版本——多智能体强化学习中最复杂的环境之一——上评估了MADRID的有效性。具体而言,我们使用MADRID为TiZero生成了多样化的对抗场景集,该先进方法经大规模分布式基础设施上的45天训练已"精通"游戏。我们揭示了TiZero战术决策中的关键缺陷,强调了多智能体系统中严格评估的重要性。