We study the problem of reward poisoning attacks against general offline reinforcement learning with deep neural networks for function approximation. We consider a black-box threat model where the attacker is completely oblivious to the learning algorithm and its budget is limited by constraining both the amount of corruption at each data point, and the total perturbation. We propose an attack strategy called `policy contrast attack'. The high-level idea is to make some low-performing policies appear as high-performing while making high-performing policies appear as low-performing. To the best of our knowledge, we propose the first black-box reward poisoning attack in the general offline RL setting. We provide theoretical insights on the attack design and empirically show that our attack is efficient against current state-of-the-art offline RL algorithms in different kinds of learning datasets.
翻译:我们研究了针对基于深度神经网络进行函数逼近的通用离线强化学习的奖励投毒攻击问题。我们考虑一种黑盒威胁模型,其中攻击者对学习算法完全无知,并且其预算通过限制每个数据点的污染量和总扰动幅度而受限。我们提出了一种名为“策略对比攻击”的攻击策略。其高层思想是使某些低性能策略看起来像高性能策略,同时使高性能策略看起来像低性能策略。据我们所知,我们首次提出了通用离线强化学习场景下的黑盒奖励投毒攻击。我们对攻击设计提供了理论见解,并通过实验表明,我们的攻击对于当前各类学习数据集上的最先进离线强化学习算法是有效的。