Reinforcement Unlearning

Machine unlearning refers to the process of mitigating the influence of specific training data on machine learning models based on removal requests from data owners. However, one important area that has been largely overlooked in the research of unlearning is reinforcement learning. Reinforcement learning focuses on training an agent to make optimal decisions within an environment to maximize its cumulative rewards. During the training, the agent tends to memorize the features of the environment, which raises a significant concern about privacy. As per data protection regulations, the owner of the environment holds the right to revoke access to the agent's training data, thus necessitating the development of a novel and pressing research field, known as \emph{reinforcement unlearning}. Reinforcement unlearning focuses on revoking entire environments rather than individual data samples. This unique characteristic presents three distinct challenges: 1) how to propose unlearning schemes for environments; 2) how to avoid degrading the agent's performance in remaining environments; and 3) how to evaluate the effectiveness of unlearning. To tackle these challenges, we propose two reinforcement unlearning methods. The first method is based on decremental reinforcement learning, which aims to erase the agent's previously acquired knowledge gradually. The second method leverages environment poisoning attacks, which encourage the agent to learn new, albeit incorrect, knowledge to remove the unlearning environment. Particularly, to tackle the third challenge, we introduce the concept of ``environment inference attack'' to evaluate the unlearning outcomes. The source code is available at \url{https://anonymous.4open.science/r/Reinforcement-Unlearning-D347}.

翻译：机器遗忘是指基于数据所有者的移除请求，减轻特定训练数据对机器学习模型影响的过程。然而，在遗忘研究中一个被严重忽视的重要领域是强化学习。强化学习专注于训练智能体在环境中做出最优决策以最大化其累积奖励。在训练过程中，智能体会倾向于记忆环境的特征，这引发了重大的隐私问题。根据数据保护法规，环境所有者有权撤销智能体对其训练数据的访问，因此催生了一个新颖且紧迫的研究领域——强化学习遗忘。强化学习遗忘关注的是撤销整个环境而非单个数据样本。这一独特特性带来了三个不同挑战：1）如何提出针对环境的遗忘方案；2）如何避免降低智能体在剩余环境中的性能；3）如何评估遗忘的有效性。为应对这些挑战，我们提出两种强化学习遗忘方法。第一种方法基于递减强化学习，旨在逐步擦除智能体先前习得的知识。第二种方法利用环境投毒攻击，促使智能体学习新的但不正确的知识以移除遗忘环境。特别地，针对第三个挑战，我们引入"环境推理攻击"概念来评估遗忘结果。源代码见 \url{https://anonymous.4open.science/r/Reinforcement-Unlearning-D347}。