Deep reinforcement learning has advanced greatly and applied in many areas. In this paper, we explore the vulnerability of deep reinforcement learning by proposing a novel generative model for creating effective adversarial examples to attack the agent. Our proposed model can achieve both targeted attacks and untargeted attacks. Considering the specificity of deep reinforcement learning, we propose the action consistency ratio as a measure of stealthiness, and a new measurement index of effectiveness and stealthiness. Experiment results show that our method can ensure the effectiveness and stealthiness of attack compared with other algorithms. Moreover, our methods are considerably faster and thus can achieve rapid and efficient verification of the vulnerability of deep reinforcement learning.
翻译:深度强化学习近年来取得了显著进展,并已广泛应用于多个领域。本文通过提出一种新型生成模型,用于创建有效的对抗样本以攻击智能体,从而探究深度强化学习的脆弱性。该模型同时支持定向攻击与非定向攻击。针对深度强化学习的特殊性,我们提出将动作一致性比率作为隐蔽性度量指标,并建立了一种新的有效性与隐蔽性综合评估指数。实验结果表明,相较其他算法,本方法能够确保攻击的有效性与隐蔽性。此外,我们的方法显著提升了运算速度,从而能够快速高效地验证深度强化学习的脆弱性。