Reinforcement learning (RL) agents are known to be vulnerable to evasion attacks during deployment. In single-agent environments, attackers can inject imperceptible perturbations on the policy or value network's inputs or outputs; in multi-agent environments, attackers can control an adversarial opponent to indirectly influence the victim's observation. Adversarial policies offer a promising solution to craft such attacks. Still, current approaches either require perfect or partial knowledge of the victim policy or suffer from sample inefficiency due to the sparsity of task-related rewards. To overcome these limitations, we propose the Intrinsically Motivated Adversarial Policy (IMAP) for efficient black-box evasion attacks in single- and multi-agent environments without any knowledge of the victim policy. IMAP uses four intrinsic objectives based on state coverage, policy coverage, risk, and policy divergence to encourage exploration and discover stronger attacking skills. We also design a novel Bias-Reduction (BR) method to boost IMAP further. Our experiments demonstrate the effectiveness of these intrinsic objectives and BR in improving adversarial policy learning in the black-box setting against multiple types of victim agents in various single- and multi-agent MuJoCo environments. Notably, our IMAP reduces the performance of the state-of-the-art robust WocaR-PPO agents by 34\%-54\% and achieves a SOTA attacking success rate of 83.91\% in the two-player zero-sum game YouShallNotPass.
翻译:强化学习(RL)智能体在部署期间易受规避攻击。在单智能体环境中,攻击者可对策略或价值网络的输入输出注入不可察觉的扰动;在多智能体环境中,攻击者可控制对抗性对手间接影响受害者的观测。对抗性策略为解决此类攻击提供了可行方案,但现有方法或需完全/部分了解受害者策略信息,或因任务相关奖励稀疏导致样本效率低下。为克服这些局限,我们提出内在动机驱动的对抗性策略(IMAP),可在无需任何受害者策略信息的情况下,高效实施单/多智能体环境中的黑盒规避攻击。IMAP基于状态覆盖、策略覆盖、风险及策略差异四种内在目标,鼓励探索并发现更强的攻击技能。同时设计新型偏差缩减(BR)方法进一步优化IMAP。实验证明,在多种单/多智能体MuJoCo环境中,所提内在目标与BR方法能有效提升针对多类受害者智能体的黑盒对抗策略学习性能。值得注意的是,我们的IMAP使最先进的鲁棒WocaR-PPO智能体性能下降34%-54%,并在双人零和游戏YouShallNotPass中达到83.91%的SOTA攻击成功率。