In real-world multi-agent reinforcement learning (MARL) applications, agents may not have perfect state information (e.g., due to inaccurate measurement or malicious attacks), which challenges the robustness of agents' policies. Though robustness is getting important in MARL deployment, little prior work has studied state uncertainties in MARL, neither in problem formulation nor algorithm design. Motivated by this robustness issue and the lack of corresponding studies, we study the problem of MARL with state uncertainty in this work. We provide the first attempt to the theoretical and empirical analysis of this challenging problem. We first model the problem as a Markov Game with state perturbation adversaries (MG-SPA) by introducing a set of state perturbation adversaries into a Markov Game. We then introduce robust equilibrium (RE) as the solution concept of an MG-SPA. We conduct a fundamental analysis regarding MG-SPA such as giving conditions under which such a robust equilibrium exists. Then we propose a robust multi-agent Q-learning (RMAQ) algorithm to find such an equilibrium, with convergence guarantees. To handle high-dimensional state-action space, we design a robust multi-agent actor-critic (RMAAC) algorithm based on an analytical expression of the policy gradient derived in the paper. Our experiments show that the proposed RMAQ algorithm converges to the optimal value function; our RMAAC algorithm outperforms several MARL and robust MARL methods in multiple multi-agent environments when state uncertainty is present. The source code is public on \url{https://github.com/sihongho/robust_marl_with_state_uncertainty}.
翻译:在真实世界的多智能体强化学习(MARL)应用中,智能体可能无法获得完全准确的状态信息(例如由于测量误差或恶意攻击),这对智能体策略的鲁棒性构成了挑战。尽管鲁棒性在MARL部署中日益重要,但现有研究鲜有在问题形式化或算法设计层面探讨MARL中的状态不确定性。受这一鲁棒性问题及相应研究缺失的启发,本文研究了具有状态不确定性的MARL问题。我们首次对该挑战性问题进行了理论与实证分析。首先,通过在马尔可夫博弈中引入一组状态扰动对抗机制,将该问题建模为具有状态扰动对抗的马尔可夫博弈(MG-SPA)。随后,我们提出鲁棒均衡(RE)作为MG-SPA的解概念,并对MG-SPA进行了基础性分析,例如给出此类鲁棒均衡存在的条件。然后,我们提出了一种鲁棒多智能体Q学习(RMAQ)算法以求解该均衡,并给出了收敛性保证。为应对高维状态-动作空间,我们基于本文推导的策略梯度解析表达式,设计了一种鲁棒多智能体Actor-Critic(RMAAC)算法。实验表明,所提出的RMAQ算法能够收敛至最优值函数;在多个多智能体环境中,当存在状态不确定性时,我们的RMAAC算法优于多种MARL及鲁棒MARL方法。源代码已公开于\url{https://github.com/sihongho/robust_marl_with_state_uncertainty}。