The widespread application of Reinforcement Learning (RL) in real-world situations is yet to come to fruition, largely as a result of its failure to satisfy the essential safety demands of such systems. Existing safe reinforcement learning (SafeRL) methods, employing cost functions to enhance safety, fail to achieve zero-cost in complex scenarios, including vision-only tasks, even with comprehensive data sampling and training. To address this, we introduce Safe DreamerV3, a novel algorithm that integrates both Lagrangian-based and planning-based methods within a world model. Our methodology represents a significant advancement in SafeRL as the first algorithm to achieve nearly zero-cost in both low-dimensional and vision-only tasks within the Safety-Gymnasium benchmark. Our project website can be found in: https://sites.google.com/view/safedreamerv3.
翻译:强化学习(RL)在现实场景中的广泛应尚未完全实现,其主要原因在于未能满足此类系统对安全性的基本需求。现有的安全强化学习(SafeRL)方法通过引入成本函数来提升安全性,但在复杂场景(包括纯视觉任务)中,即使经过全面的数据采样和训练,仍无法实现零成本目标。为解决这一问题,我们提出了Safe DreamerV3——一种新颖的算法,其将基于拉格朗日的方法与基于规划的方法融合到世界模型框架中。该方法代表了安全强化学习领域的重要进步,是首个在Safety-Gymnasium基准测试中,于低维任务与纯视觉任务中均实现近乎零成本性能的算法。项目网站地址为:https://sites.google.com/view/safedreamerv3。