Safety guarantees are a prerequisite to the deployment of reinforcement learning (RL) agents in safety-critical tasks. Often, deployment environments exhibit non-stationary dynamics or are subject to changing performance goals, requiring updates to the learned policy. This leads to a fundamental challenge: how to update an RL policy while preserving its safety properties on previously encountered tasks? The majority of current approaches either do not provide formal guarantees or verify policy safety only a posteriori. We propose a novel a priori approach to safe policy updates in continual RL by introducing the Rashomon set: a region in policy parameter space certified to meet safety constraints within the demonstration data distribution. We then show that one can provide formal, provable guarantees for arbitrary RL algorithms used to update a policy by projecting their updates onto the Rashomon set. Empirically, we validate this approach across grid-world navigation environments (Frozen Lake and Poisoned Apple) where we guarantee an a priori provably deterministic safety on the source task during downstream adaptation. In contrast, we observe that regularisation-based baselines experience catastrophic forgetting of safety constraints while our approach enables strong adaptation with provable guarantees that safety is preserved.
翻译:安全保证是强化学习(RL)智能体在安全关键任务中部署的前提条件。通常,部署环境表现出非平稳动态特性或面临不断变化的性能目标,这需要对已学习的策略进行更新。这带来一个根本性挑战:如何在更新RL策略的同时,保留其在先前任务中的安全属性?现有大多数方法要么不提供形式化保证,要么仅能事后验证策略安全性。我们提出一种新颖的事前方法来保障持续RL中的安全策略更新,通过引入拉希莫集(Rashomon set):在演示数据分布内被认证满足安全约束的策略参数空间区域。随后我们证明,通过将任意RL算法(用于更新策略)的更新结果投影到拉希莫集上,可以为这些算法提供形式化的可证明保证。实验上,我们在网格世界导航环境(冰冻湖与毒苹果)中验证了该方法,在下游自适应过程中可对源任务实现事前可证明的确定性安全保证。相比之下,基于正则化的基线方法会出现安全约束灾难性遗忘,而我们提出的方法能在提供安全保持可证明保证的同时实现强自适应能力。