In cooperative Multi-Agent Reinforcement Learning (MARL), it is a common practice to tune hyperparameters in ideal simulated environments to maximize cooperative performance. However, policies tuned for cooperation often fail to maintain robustness and resilience under real-world uncertainties. Building trustworthy MARL systems requires a deep understanding of robustness, which ensures stability under uncertainties, and resilience, the ability to recover from disruptions--a concept extensively studied in control systems but largely overlooked in MARL. In this paper, we present a large-scale empirical study comprising over 82,620 experiments to evaluate cooperation, robustness, and resilience in MARL across 4 real-world environments, 13 uncertainty types, and 15 hyperparameters. Our key findings are: (1) Under mild uncertainty, optimizing cooperation improves robustness and resilience, but this link weakens as perturbations intensify. Robustness and resilience also varies by algorithm and uncertainty type. (2) Robustness and resilience do not generalize across uncertainty modalities or agent scopes: policies robust to action noise for all agents may fail under observation noise on a single agent. (3) Hyperparameter tuning is critical for trustworthy MARL: surprisingly, standard practices like parameter sharing, GAE, and PopArt can hurt robustness, while early stopping, high critic learning rates, and Leaky ReLU consistently help. By optimizing hyperparameters only, we observe substantial improvement in cooperation, robustness and resilience across all MARL backbones, with the phenomenon also generalizing to robust MARL methods across these backbones. Code and results available at https://github.com/BUAA-TrustworthyMARL/adv_marl_benchmark .
翻译:在合作多智能体强化学习(MARL)中,通常会在理想仿真环境中调整超参数以最大化合作性能。然而,为合作优化的策略往往难以在现实世界的不确定性下保持鲁棒性与恢复力。构建可信赖的MARL系统需要深入理解鲁棒性(确保不确定性下的稳定性)与恢复力(从干扰中恢复的能力)——这一概念在控制系统中已被广泛研究,但在MARL领域却长期被忽视。本文通过包含82,620次实验的大规模实证研究,在4个真实环境、13种不确定性类型和15个超参数上评估MARL的合作性、鲁棒性与恢复力。我们的主要发现如下:(1)在轻度不确定性下,优化合作性能可提升鲁棒性与恢复力,但随着扰动加剧,这种关联性会减弱。鲁棒性与恢复力也因算法和不确定性类型而异。(2)鲁棒性与恢复力在不同不确定性模态或智能体范围间不具备泛化性:对所有智能体动作噪声具有鲁棒性的策略,可能在单个智能体的观测噪声下失效。(3)超参数调优对可信赖MARL至关重要:令人惊讶的是,参数共享、GAE和PopArt等标准实践可能损害鲁棒性,而早停、高评论家学习率和Leaky ReLU则始终具有积极作用。仅通过优化超参数,我们在所有MARL骨干网络上观察到合作性、鲁棒性与恢复力的显著提升,且该现象在这些骨干网络的鲁棒MARL方法中同样具有泛化性。代码与结果详见 https://github.com/BUAA-TrustworthyMARL/adv_marl_benchmark。