Adversarial attacks in reinforcement learning (RL) often assume highly-privileged access to the victim's parameters, environment, or data. Instead, this paper proposes a novel adversarial setting called a Cheap Talk MDP in which an Adversary can merely append deterministic messages to the Victim's observation, resulting in a minimal range of influence. The Adversary cannot occlude ground truth, influence underlying environment dynamics or reward signals, introduce non-stationarity, add stochasticity, see the Victim's actions, or access their parameters. Additionally, we present a simple meta-learning algorithm called Adversarial Cheap Talk (ACT) to train Adversaries in this setting. We demonstrate that an Adversary trained with ACT still significantly influences the Victim's training and testing performance, despite the highly constrained setting. Affecting train-time performance reveals a new attack vector and provides insight into the success and failure modes of existing RL algorithms. More specifically, we show that an ACT Adversary is capable of harming performance by interfering with the learner's function approximation, or instead helping the Victim's performance by outputting useful features. Finally, we show that an ACT Adversary can manipulate messages during train-time to directly and arbitrarily control the Victim at test-time. Project video and code are available at https://sites.google.com/view/adversarial-cheap-talk
翻译:在强化学习(RL)中,对抗性攻击通常假设攻击者具有对受害者参数、环境或数据的高特权访问权限。相反,本文提出了一种新颖的对抗性设置,称为“廉价交谈马尔可夫决策过程”(Cheap Talk MDP),其中对抗者仅能将确定性消息附加到受害者的观测中,从而形成最小范围的影响。对抗者无法遮蔽真实信息、影响底层环境动态或奖励信号、引入非平稳性、增加随机性、查看受害者的动作或访问其参数。此外,我们提出了一种简单的元学习算法——对抗性廉价交谈(ACT),用于在此设置中训练对抗者。我们证明,尽管设置高度受限,但使用ACT训练的对抗者仍能显著影响受害者的训练和测试性能。影响训练时的性能揭示了一种新的攻击向量,并为现有RL算法的成功与失败模式提供了洞察。更具体地说,我们展示了ACT对抗者能够通过干扰学习器的函数逼近来损害性能,或者通过输出有用的特征来帮助受害者的性能。最后,我们证明了ACT对抗者可以在训练期间操纵消息,从而在测试时直接且任意地控制受害者。项目视频和代码可在线获取:https://sites.google.com/view/adversarial-cheap-talk