We study robustness to agent malfunctions in cooperative multi-agent reinforcement learning (MARL), a failure mode that is critical in practice yet underexplored in existing theory. We introduce MARTA, a plug-and-play robustness layer that augments standard MARL algorithms with a Switcher-Adversary mechanism which selectively induces malfunctions in performance-critical states. This formulation defines a fault-switching $(N+2)$-player Markov game in which the Switcher chooses when and which agent fails, and the Adversary controls the resulting faulty behaviour via random or worst-case policies. We develop a Q-learning-type scheme and show that the associated Bellman operator is a contraction, yielding existence and uniqueness of the minimax value, convergence to a Markov perfect equilibrium. MARTA integrates seamlessly with MARL algorithms without architectural modification and consistently improves robustness across Traffic Junction (TJ), Level-Based Foraging (LBF), MPE SimpleTag, and SMAC (v2). In these domains, MARTA achieves large gains in final performance of up to 116.7\% in SMAC, 21.4\% in MPE SimpleTag, and 44.6\% in LBF, while significantly reducing failure rates under train-test mismatched fault regimes. These results establish MARTA as a theoretically grounded and practically deployable mechanism for fault-tolerant MARL.
翻译:本文研究协作式多智能体强化学习(MARL)中智能体故障的鲁棒性问题——这一故障模式在实践中至关重要,但在现有理论中尚未得到充分探索。我们提出MARTA,一种即插即用的鲁棒性增强层,通过“切换器-对抗器”机制对标准MARL算法进行扩展,该机制选择性地在性能关键状态诱发智能体故障。该形式化定义了一个故障切换的$(N+2)$-玩家马尔可夫博弈,其中切换器决定故障发生的时机与对象,而对抗器则通过随机或最坏情况策略控制故障行为。我们开发了一种Q学习型算法,并证明其对应的贝尔曼算子具有压缩性,从而保证了极小极大值的存在唯一性以及向马尔可夫完美均衡的收敛。MARTA无需修改算法架构即可与MARL算法无缝集成,并在交通路口(TJ)、基于等级的觅食(LBF)、MPE SimpleTag以及SMAC(v2)等场景中持续提升鲁棒性。在这些领域中,MARTA实现了显著的最终性能提升:在SMAC中最高达116.7%,在MPE SimpleTag中为21.4%,在LBF中为44.6%,同时在训练-测试故障机制不匹配的情况下显著降低了故障率。这些结果表明,MARTA是一种理论坚实且可实际部署的容错MARL机制。