In the rapidly advancing field of multi-agent systems, ensuring robustness in unfamiliar and adversarial settings is crucial. Notwithstanding their outstanding performance in familiar environments, these systems often falter in new situations due to overfitting during the training phase. This is especially pronounced in settings where both cooperative and competitive behaviours are present, encapsulating a dual nature of overfitting and generalisation challenges. To address this issue, we present Multi-Agent Diagnostics for Robustness via Illuminated Diversity (MADRID), a novel approach for generating diverse adversarial scenarios that expose strategic vulnerabilities in pre-trained multi-agent policies. Leveraging the concepts from open-ended learning, MADRID navigates the vast space of adversarial settings, employing a target policy's regret to gauge the vulnerabilities of these settings. We evaluate the effectiveness of MADRID on the 11vs11 version of Google Research Football, one of the most complex environments for multi-agent reinforcement learning. Specifically, we employ MADRID for generating a diverse array of adversarial settings for TiZero, the state-of-the-art approach which "masters" the game through 45 days of training on a large-scale distributed infrastructure. We expose key shortcomings in TiZero's tactical decision-making, underlining the crucial importance of rigorous evaluation in multi-agent systems.
翻译:在多智能体系统快速发展的领域中,确保其在陌生与对抗性环境中的鲁棒性至关重要。尽管这些系统在熟悉环境中表现出色,但由于训练阶段的过拟合,它们在新情境中往往表现不佳。这在合作与竞争行为并存的场景中尤为明显,体现了过拟合与泛化挑战的双重性。为解决这一问题,我们提出了基于光照多样性的多智能体鲁棒性诊断方法(MADRID),这是一种生成多样化对抗场景的新方法,旨在揭示预训练多智能体策略中的战略脆弱性。借鉴开放式学习的概念,MADRID在广阔的对抗性设置空间中导航,利用目标策略的遗憾值来评估这些设置的脆弱性。我们在Google Research Football的11vs11版本上评估MADRID的有效性,该环境是多智能体强化学习领域最复杂的场景之一。具体而言,我们运用MADRID为TiZero生成多样化的对抗场景——TiZero是通过在大规模分布式基础设施上进行45天训练而“精通”该游戏的尖端方法。我们揭示了TiZero在战术决策中的关键缺陷,从而强调了多智能体系统中严格评估的至关重要性。