Deep Reinforcement Learning (RL) has shown promise in addressing complex robotic challenges. In real-world applications, RL is often accompanied by failsafe controllers as a last resort to avoid catastrophic events. While necessary for safety, these interventions can result in undesirable behaviors, such as abrupt braking or aggressive steering. This paper proposes two safety intervention reduction methods: proactive replacement and proactive projection, which change the action of the agent if it leads to a potential failsafe intervention. These approaches are compared to state-of-the-art constrained RL on the OpenAI safety gym benchmark and a human-robot collaboration task. Our study demonstrates that the combination of our method with provably safe RL leads to high-performing policies with zero safety violations and a low number of failsafe interventions. Our versatile method can be applied to a wide range of real-world robotic tasks, while effectively improving safety without sacrificing task performance.
翻译:深度强化学习在解决复杂机器人挑战方面展现出潜力。在实际应用中,强化学习常配备故障安全控制器作为避免灾难性事件的最后手段。尽管对安全而言必不可少,但这些干预可能导致不良行为,如急刹车或激进转向。本文提出两种减少安全干预的方法:主动替换与主动投影,这两种方法会在智能体行为可能触发故障安全干预时改变其动作。这些方法与最先进的约束强化学习在OpenAI安全健身房基准测试及人机协作任务上进行了比较。我们的研究表明,将我们的方法与可证明安全的强化学习相结合,能够实现零安全违规且故障安全干预次数极少的高性能策略。这一通用方法可广泛应用于现实机器人任务,在有效提升安全性的同时不牺牲任务性能。