Offline safe reinforcement learning (Safe RL) enables policy learning without online interactions, making it suitable for safety-critical systems such as robotics systems. However, its reliance on static datasets exposes offline Safe RL to data poisoning attacks, where adversaries inject malicious samples that compromise safety and induce unsafe policy behavior. In this work, we propose a new learning paradigm, named safe reinforcement unlearning (Safe-RULE), used as a defense framework to remove the influence of poisoned data without retraining from scratch or requiring access to the original training environment. We further extend reinforcement unlearning to offline Safe RL by explicitly accounting for both task performance and safety constraints during the unlearning process. Experiments across benchmark Safe RL tasks demonstrate that our approach effectively enhances safety performance against data poisoning attacks.
翻译:离线安全强化学习能够在无需在线交互的情况下进行策略学习,因而适用于机器人系统等安全关键型系统。然而,其对静态数据集的依赖使离线安全强化学习面临数据投毒攻击的威胁——攻击者通过注入恶意样本破坏安全性,诱发不安全策略行为。本文提出一种名为"安全强化反学习"(Safe-RULE)的新学习范式,该范式作为一种防御框架,可在无需从零开始重新训练或访问原始训练环境的情况下消除受污染数据的影响。我们进一步将强化反学习扩展至离线安全强化学习领域,在反学习过程中明确兼顾任务性能与安全约束。在基准安全强化学习任务上的实验表明,我们的方法能够有效提升对数据投毒攻击的安全防护性能。