Learning from Demonstration (LfD) is a powerful method for enabling robots to perform novel tasks as it is often more tractable for a non-roboticist end-user to demonstrate the desired skill and for the robot to efficiently learn from the associated data than for a human to engineer a reward function for the robot to learn the skill via reinforcement learning (RL). Safety issues arise in modern LfD techniques, e.g., Inverse Reinforcement Learning (IRL), just as they do for RL; yet, safe learning in LfD has received little attention. In the context of agile robots, safety is especially vital due to the possibility of robot-environment collision, robot-human collision, and damage to the robot. In this paper, we propose a safe IRL framework, CBFIRL, that leverages the Control Barrier Function (CBF) to enhance the safety of the IRL policy. The core idea of CBFIRL is to combine a loss function inspired by CBF requirements with the objective in an IRL method, both of which are jointly optimized via gradient descent. In the experiments, we show our framework performs safer compared to IRL methods without CBF, that is $\sim15\%$ and $\sim20\%$ improvement for two levels of difficulty of a 2D racecar domain and $\sim 50\%$ improvement for a 3D drone domain.
翻译:从示范中学习(LfD)是一种使机器人能够执行新任务的强大方法,因为与非机器人领域的终端用户通过强化学习(RL)设计奖励函数来让机器人学习技能相比,用户更易于演示所需技能,机器人也能更高效地从相关数据中学习。现代LfD技术(例如逆强化学习(IRL))中出现了与RL类似的安全问题;然而,LfD中的安全学习却鲜受关注。在敏捷机器人场景下,由于可能发生机器人与环境碰撞、机器人与人碰撞以及机器人自身损坏,安全性尤为重要。本文提出了一种安全的IRL框架CBFIRL,利用控制障碍函数(CBF)增强IRL策略的安全性。CBFIRL的核心思想是将受CBF要求启发的损失函数与IRL方法中的目标相结合,两者通过梯度下降联合优化。在实验中,我们的框架相比于不含CBF的IRL方法表现出更优的安全性:在二维赛车领域的两种难度级别下安全性能提升约15%和20%,在三维无人机领域提升约50%。