Finding optimal adversarial attack strategies is an important topic in reinforcement learning and the Markov decision process. Previous studies usually assume one all-knowing coordinator (attacker) for whom attacking different recipient (victim) agents incurs uniform costs. However, in reality, instead of using one limitless central attacker, the attacks often need to be performed by distributed attack agents. We formulate the problem of performing optimal adversarial agent-to-agent attacks using distributed attack agents, in which we impose distinct cost constraints on each different attacker-victim pair. We propose an optimal method integrating within-step static constrained attack-resource allocation optimization and between-step dynamic programming to achieve the optimal adversarial attack in a multi-agent system. Our numerical results show that the proposed attacks can significantly reduce the rewards received by the attacked agents.
翻译:寻找最优对抗攻击策略是强化学习和马尔可夫决策过程中的重要课题。以往研究通常假设存在一个全知协调者(攻击方),其攻击不同接收方(受害者)智能体时产生统一成本。然而在现实中,攻击通常需要由分布式攻击智能体执行,而非使用单一无限制的中心攻击者。本文提出了利用分布式攻击智能体对智能体进行最优对抗攻击的问题,其中我们对每个不同的攻击-受害者对施加了不同的成本约束。我们提出了一种最优方法,该方法将步内静态约束攻击资源分配优化与步间动态规划相结合,以实现在多智能体系统中的最优对抗攻击。数值结果表明,所提出的攻击能显著降低被攻击智能体获得的奖励。