Safe Reinforcement Learning (RL) plays an important role in applying RL algorithms to safety-critical real-world applications, addressing the trade-off between maximizing rewards and adhering to safety constraints. This work introduces a novel approach that combines RL with trajectory optimization to manage this trade-off effectively. Our approach embeds safety constraints within the action space of a modified Markov Decision Process (MDP). The RL agent produces a sequence of actions that are transformed into safe trajectories by a trajectory optimizer, thereby effectively ensuring safety and increasing training stability. This novel approach excels in its performance on challenging Safety Gym tasks, achieving significantly higher rewards and near-zero safety violations during inference. The method's real-world applicability is demonstrated through a safe and effective deployment in a real robot task of box-pushing around obstacles.
翻译:安全强化学习(RL)在将RL算法应用于现实世界中的安全关键型任务方面发挥着重要作用,它解决了最大化奖励与遵守安全约束之间的权衡问题。本研究提出了一种新颖方法,将RL与轨迹优化相结合,以有效管理这种权衡。我们的方法将安全约束嵌入到一个改进的马尔可夫决策过程(MDP)的动作空间中。RL智能体产生一系列动作,这些动作通过轨迹优化器被转化为安全轨迹,从而有效确保安全性并提高训练稳定性。这一新颖方法在具有挑战性的Safety Gym任务上表现出色,在推理过程中获得了显著更高的奖励且安全违规近乎为零。该方法在现实世界中的适用性通过一个在真实机器人任务(即绕障碍物推箱子)中安全有效的部署得到了验证。