Various methods for Multi-Agent Reinforcement Learning (MARL) have been developed with the assumption that agents' policies are based on accurate state information. However, policies learned through Deep Reinforcement Learning (DRL) are susceptible to adversarial state perturbation attacks. In this work, we propose a State-Adversarial Markov Game (SAMG) and make the first attempt to investigate the fundamental properties of MARL under state uncertainties. Our analysis shows that the commonly used solution concepts of optimal agent policy and robust Nash equilibrium do not always exist in SAMGs. To circumvent this difficulty, we consider a new solution concept called robust agent policy, where agents aim to maximize the worst-case expected state value. We prove the existence of robust agent policy for finite state and finite action SAMGs. Additionally, we propose a Robust Multi-Agent Adversarial Actor-Critic (RMA3C) algorithm to learn robust policies for MARL agents under state uncertainties. Our experiments demonstrate that our algorithm outperforms existing methods when faced with state perturbations and greatly improves the robustness of MARL policies. Our code is public on https://songyanghan.github.io/what_is_solution/.
翻译:多智能体强化学习(MARL)的各种方法通常假设智能体的策略基于准确的状态信息。然而,通过深度强化学习(DRL)习得的策略容易受到对抗性状态扰动攻击。本文提出了一种对抗状态马尔可夫博弈(SAMG),并首次尝试研究状态不确定性下MARL的基本性质。分析表明,在SAMGs中,常用的最优智能体策略和鲁棒纳什均衡解概念并不总是存在。为克服这一困难,我们考虑一种名为鲁棒智能体策略的新解概念,其中智能体旨在最大化最坏情况下的期望状态值。我们证明了有限状态和有限动作SAMGs中鲁棒智能体策略的存在性。此外,我们提出了一种鲁棒多智能体对抗性演员-评论家(RMA3C)算法,以学习状态不确定性下MARL智能体的鲁棒策略。实验表明,在面对状态扰动时,我们的算法优于现有方法,并显著提高了MARL策略的鲁棒性。我们的代码已公开在https://songyanghan.github.io/what_is_solution/。