Decentralized multi-agent reinforcement learning (MARL) algorithms have become popular in the literature since it allows heterogeneous agents to have their own reward functions as opposed to canonical multi-agent Markov Decision Process (MDP) settings which assume common reward functions over all agents. In this work, we follow the existing work on collaborative MARL where agents in a connected time varying network can exchange information among each other in order to reach a consensus. We introduce vulnerabilities in the consensus updates of existing MARL algorithms where agents can deviate from their usual consensus update, who we term as adversarial agents. We then proceed to provide an algorithm that allows non-adversarial agents to reach a consensus in the presence of adversaries under a constrained setting.
翻译:分散式多智能体强化学习(MARL)算法在文献中日益流行,因为它允许多个异构智能体拥有各自的奖励函数,而不同于经典的多智能体马尔可夫决策过程(MDP)设定中假设所有智能体共享同一奖励函数。本文借鉴了现有协作式MARL的研究工作——其中智能体在连通时变网络中相互交换信息以达到共识。我们揭示了现有MARL算法共识更新中的脆弱性:部分智能体可能偏离正常的共识更新过程,我们将此类智能体定义为对抗性智能体。在此基础上,我们提出了一种算法,使得在受限设定下,非对抗智能体能够在存在对抗智能体的环境中达成共识。