In the rapidly advancing field of multi-agent systems, ensuring robustness in unfamiliar and adversarial settings is crucial. Notwithstanding their outstanding performance in familiar environments, these systems often falter in new situations due to overfitting during the training phase. This is especially pronounced in settings where both cooperative and competitive behaviours are present, encapsulating a dual nature of overfitting and generalisation challenges. To address this issue, we present Multi-Agent Diagnostics for Robustness via Illuminated Diversity (MADRID), a novel approach for generating diverse adversarial scenarios that expose strategic vulnerabilities in pre-trained multi-agent policies. Leveraging the concepts from open-ended learning, MADRID navigates the vast space of adversarial settings, employing a target policy's regret to gauge the vulnerabilities of these settings. We evaluate the effectiveness of MADRID on the 11vs11 version of Google Research Football, one of the most complex environments for multi-agent reinforcement learning. Specifically, we employ MADRID for generating a diverse array of adversarial settings for TiZero, the state-of-the-art approach which "masters" the game through 45 days of training on a large-scale distributed infrastructure. We expose key shortcomings in TiZero's tactical decision-making, underlining the crucial importance of rigorous evaluation in multi-agent systems.
翻译:在多智能体系统快速发展的领域中,确保其在陌生和对抗环境中的鲁棒性至关重要。尽管这些系统在熟悉环境中表现出色,但由于训练阶段的过拟合,它们常常在新情境中表现不佳。这在同时存在合作与竞争行为的环境中尤为突出,体现了过拟合与泛化挑战的双重属性。为解决此问题,我们提出了通过光照多样性实现鲁棒性的多智能体诊断方法(MADRID),这是一种新颖的方法,用于生成多样化的对抗场景,以暴露预训练多智能体策略中的战略漏洞。借助开放式学习的概念,MADRID探索了广阔的对抗场景空间,利用目标策略的遗憾值来衡量这些场景的脆弱性。我们在Google Research Football的11对11版本(多智能体强化学习中最复杂的环境之一)上评估了MADRID的有效性。具体来说,我们使用MADRID为TiZero(一种通过在大型分布式基础设施上训练45天而“精通”该游戏的最先进方法)生成多样化的对抗场景。我们揭示了TiZero在战术决策中的关键缺陷,强调了多智能体系统中严格评估的至关重要性。