Adversarial attacks in reinforcement learning (RL) often assume highly-privileged access to the victim's parameters, environment, or data. Instead, this paper proposes a novel adversarial setting called a Cheap Talk MDP in which an Adversary can merely append deterministic messages to the Victim's observation, resulting in a minimal range of influence. The Adversary cannot occlude ground truth, influence underlying environment dynamics or reward signals, introduce non-stationarity, add stochasticity, see the Victim's actions, or access their parameters. Additionally, we present a simple meta-learning algorithm called Adversarial Cheap Talk (ACT) to train Adversaries in this setting. We demonstrate that an Adversary trained with ACT still significantly influences the Victim's training and testing performance, despite the highly constrained setting. Affecting train-time performance reveals a new attack vector and provides insight into the success and failure modes of existing RL algorithms. More specifically, we show that an ACT Adversary is capable of harming performance by interfering with the learner's function approximation, or instead helping the Victim's performance by outputting useful features. Finally, we show that an ACT Adversary can manipulate messages during train-time to directly and arbitrarily control the Victim at test-time. Project video and code are available at https://sites.google.com/view/adversarial-cheap-talk
翻译:在强化学习(RL)中,对抗攻击通常假设攻击者具有对受害者参数、环境或数据的高特权访问权限。相反,本文提出了一种新颖的对抗设置,称为“廉价对话MDP”(Cheap Talk MDP),其中攻击者仅能将确定性消息附加到受害者的观测中,从而产生最小的影响范围。攻击者无法遮挡真实状态、影响底层环境动态或奖励信号、引入非平稳性、增加随机性、查看受害者的动作或访问其参数。此外,我们提出了一种简单的元学习算法,称为“对抗性廉价对话”(ACT),用于在此设置中训练攻击者。我们证明,尽管设置高度受限,使用ACT训练的攻击者仍能显著影响受害者的训练和测试性能。影响训练时性能揭示了一种新的攻击向量,并提供了对现有RL算法成功与失败模式的洞察。更具体地说,我们展示了ACT攻击者能够通过干扰学习者的函数逼近来损害性能,或者通过输出有用的特征来帮助受害者的性能。最后,我们表明ACT攻击者可以在训练时操纵消息,从而在测试时直接且任意地控制受害者。项目视频和代码可在https://sites.google.com/view/adversarial-cheap-talk 获取。