Attacking Cooperative Multi-Agent Reinforcement Learning by Adversarial Minority Influence

Cooperative multi-agent reinforcement learning (c-MARL) offers a general paradigm for a group of agents to achieve a shared goal by taking individual decisions, yet is found to be vulnerable to adversarial attacks. Though harmful, adversarial attacks also play a critical role in evaluating the robustness and finding blind spots of c-MARL algorithms. However, existing attacks are not sufficiently strong and practical, which is mainly due to the ignorance of complex influence between agents and cooperative nature of victims in c-MARL. In this paper, we propose adversarial minority influence (AMI), the first practical attack against c-MARL by introducing an adversarial agent. AMI addresses the aforementioned problems by unilaterally influencing other cooperative victims to a targeted worst-case cooperation. Technically, to maximally deviate victim policy under complex agent-wise influence, our unilateral attack characterize and maximize the influence from adversary to victims. This is done by adapting a unilateral agent-wise relation metric derived from mutual information, which filters out the detrimental influence from victims to adversary. To fool victims into a jointly worst-case failure, our targeted attack influence victims to a long-term, cooperatively worst case by distracting each victim to a specific target. Such target is learned by a reinforcement learning agent in a trial-and-error process. Extensive experiments in simulation environments, including discrete control (SMAC), continuous control (MAMujoco) and real-world robot swarm control demonstrate the superiority of our AMI approach. Our codes are available in https://anonymous.4open.science/r/AMI.

翻译：合作多智能体强化学习（c-MARL）为多个智能体通过个体决策实现共同目标提供了通用范式，但已被发现易受对抗性攻击影响。尽管具有危害性，对抗性攻击在评估c-MARL算法的鲁棒性及发现其盲点方面仍发挥关键作用。然而，现有攻击因忽略智能体间的复杂影响和受害者的合作本质，其强度与实用性尚显不足。本文提出对抗性少数影响力（AMI），这是首个通过引入对抗智能体实现的实际c-MARL攻击。AMI通过单方面影响其他合作受害者达成目标性最差合作，解决了上述问题。技术上，为在复杂智能体间影响下最大化偏离受害策略，我们设计并最大化对抗者到受害者的影响力，通过自适应采用基于互信息的单边智能体关系度量实现，该度量能过滤掉受害者对对抗者的有害影响。为使受害者陷入联合最差失败状态，我们的目标攻击通过将每个受害者引导至特定目标，使其进入长期且协作性的最差情境。此类目标由强化学习智能体通过试错过程学习。在包含离散控制（SMAC）、连续控制（MAMujoco）及真实机器人集群控制的模拟环境中的大量实验，证明了我们AMI方法的优越性。我们的代码已在https://anonymous.4open.science/r/AMI开源。