We study data poisoning attacks on online deep reinforcement learning (DRL) where the attacker is oblivious to the learning algorithm used by the agent and does not necessarily have full knowledge of the environment. We demonstrate the intrinsic vulnerability of state-of-the-art DRL algorithms by designing a general reward poisoning framework called adversarial MDP attacks. We instantiate our framework to construct several new attacks which only corrupt the rewards for a small fraction of the total training timesteps and make the agent learn a low-performing policy. Our key insight is that the state-of-the-art DRL algorithms strategically explore the environment to find a high-performing policy. Our attacks leverage this insight to construct a corrupted environment for misleading the agent towards learning low-performing policies with a limited attack budget. We provide a theoretical analysis of the efficiency of our attack and perform an extensive evaluation. Our results show that our attacks efficiently poison agents learning with a variety of state-of-the-art DRL algorithms, such as DQN, PPO, SAC, etc. under several popular classical control and MuJoCo environments.
翻译:我们研究在线深层强化学习(DRL)的数据中毒攻击,攻击者对此熟视无睹地忽略了代理人使用的学习算法,而且不一定完全了解环境。我们通过设计一个称为对抗性MDP攻击的一般奖励中毒框架,展示了最先进的DRL算法的内在脆弱性。我们即兴设计了几起新的攻击框架,这些攻击只腐蚀了培训总时间步数的一小部分的奖励,并使代理人学会了低效的政策。我们的关键见解是,最先进的DRL算法从战略角度探索环境,以寻找高性能的政策。我们的攻击利用这一洞察力来构建一个腐败的环境,用有限的攻击预算来误导代理人学习低效政策。我们对攻击的效率进行理论分析,并进行广泛的评估。我们的结果表明,我们的攻击有效了毒剂,他们学习了各种先进的DRL算法,如DQN、PO、SAC等,在一些流行古典控制和MuJoCo环境之下。