The multi-agent reinforcement learning systems (MARL) based on the Markov decision process (MDP) have emerged in many critical applications. To improve the robustness/defense of MARL systems against adversarial attacks, the study of various adversarial attacks on reinforcement learning systems is very important. Previous works on adversarial attacks considered some possible features to attack in MDP, such as the action poisoning attacks, the reward poisoning attacks, and the state perception attacks. In this paper, we propose a brand-new form of attack called the camouflage attack in the MARL systems. In the camouflage attack, the attackers change the appearances of some objects without changing the actual objects themselves; and the camouflaged appearances may look the same to all the targeted recipient (victim) agents. The camouflaged appearances can mislead the recipient agents to misguided actions. We design algorithms that give the optimal camouflage attacks minimizing the rewards of recipient agents. Our numerical and theoretical results show that camouflage attacks can rival the more conventional, but likely more difficult state perception attacks. We also investigate cost-constrained camouflage attacks and showed numerically how cost budgets affect the attack performance.
翻译:基于马尔可夫决策过程的多智能体强化学习系统已在许多关键应用中出现。为提升多智能体强化学习系统抵御对抗攻击的鲁棒性/防御能力,研究针对强化学习系统的各种对抗攻击至关重要。以往关于对抗攻击的研究考虑了马尔可夫决策过程中一些可能的攻击特征,例如动作投毒攻击、奖励投毒攻击和状态感知攻击。本文提出一种全新攻击形式——伪装攻击,应用于多智能体强化学习系统。在伪装攻击中,攻击者改变某些物体的外观而不改变物体本身;伪装后的外观对所有目标接收者(受害者)智能体可能呈现相同效果。伪装外观能误导接收智能体采取错误行动。我们设计了可最小化接收智能体奖励的最优伪装攻击算法。数值与理论结果表明,伪装攻击能与更传统但可能实施难度更高的状态感知攻击相抗衡。我们还研究了成本约束下的伪装攻击,并通过数值实验展示了成本预算如何影响攻击性能。