Deep Reinforcement Learning (RL) has shown promise in addressing complex robotic challenges. In real-world applications, RL is often accompanied by failsafe controllers as a last resort to avoid catastrophic events. While necessary for safety, these interventions can result in undesirable behaviors, such as abrupt braking or aggressive steering. This paper proposes two safety intervention reduction methods: action replacement and projection, which change the agent's action if it leads to an unsafe state. These approaches are compared to the state-of-the-art constrained RL on the OpenAI safety gym benchmark and a human-robot collaboration task. Our study demonstrates that the combination of our method with provably safe RL leads to high-performing policies with zero safety violations and a low number of failsafe interventions. Our versatile method can be applied to a wide range of real-world robotics tasks, while effectively improving safety without sacrificing task performance.
翻译:深度强化学习在解决复杂机器人挑战方面展现出前景。在实际应用中,强化学习通常配套使用故障安全控制器作为避免灾难性事件的最后手段。尽管这些干预对安全至关重要,但它们可能导致不良行为,如急刹车或猛打方向盘。本文提出两种减少安全干预的方法:动作替换与投影,当智能体的动作会导致不安全状态时,这些方法会改变其动作。我们将这些方法与最先进的约束强化学习在OpenAI安全健身基准测试及人机协作任务上进行了比较。研究表明,我们的方法与可证明安全强化学习相结合,能够产生零安全违规且故障安全干预次数极少的高性能策略。我们的多功能方法可应用于广泛的现实世界机器人任务,在不牺牲任务性能的前提下有效提升安全性。