Various methods for Multi-Agent Reinforcement Learning (MARL) have been developed with the assumption that agents' policies are based on accurate state information. However, policies learned through Deep Reinforcement Learning (DRL) are susceptible to adversarial state perturbation attacks. In this work, we propose a State-Adversarial Markov Game (SAMG) and make the first attempt to investigate the fundamental properties of MARL under state uncertainties. Our analysis shows that the commonly used solution concepts of optimal agent policy and robust Nash equilibrium do not always exist in SAMGs. To circumvent this difficulty, we consider a new solution concept called robust agent policy, where agents aim to maximize the worst-case expected state value. We prove the existence of robust agent policy for finite state and finite action SAMGs. Additionally, we propose a Robust Multi-Agent Adversarial Actor-Critic (RMA3C) algorithm to learn robust policies for MARL agents under state uncertainties. Our experiments demonstrate that our algorithm outperforms existing methods when faced with state perturbations and greatly improves the robustness of MARL policies. Our code is public on https://songyanghan.github.io/what_is_solution/.
翻译:多种多智能体强化学习方法假设智能体的策略基于准确的状态信息。然而,通过深度强化学习习得的策略易受对抗状态扰动攻击。本文提出状态对抗马尔可夫博弈,并首次尝试研究状态不确定性下多智能体强化学习的基本性质。我们的分析表明,在状态对抗马尔可夫博弈中,常用的最优智能体策略和鲁棒纳什均衡解概念并不总是存在。为克服这一困难,我们提出一种新的解概念——鲁棒智能体策略,即智能体旨在最大化最坏情况下的期望状态值。我们证明有限状态有限动作的状态对抗马尔可夫博弈中鲁棒智能体策略的存在性。此外,我们提出鲁棒多智能体对抗演员-评论家算法,以在状态不确定性下为多智能体强化学习智能体学习鲁棒策略。实验表明,我们的算法在面对状态扰动时优于现有方法,并显著提升了多智能体强化学习策略的鲁棒性。我们的代码公开于https://songyanghan.github.io/what_is_solution/。