In safety-critical robot planning or control, manually specifying safety constraints or learning them from demonstrations can be challenging. In this article, we propose a certifiable alignment method for a robot to learn a safety constraint in its model predictive control (MPC) policy with human online directional feedback. To our knowledge, it is the first method to learn safety constraints from human feedback. The proposed method is based on an empirical observation: human directional feedback, when available, tends to guide the robot toward safer regions. The method only requires the direction of human feedback to update the learning hypothesis space. It is certifiable, providing an upper bound on the total number of human feedback in the case of successful learning, or declaring the hypothesis misspecification, i.e., the true implicit safety constraint cannot be found within the specified hypothesis space. We evaluated the proposed method using numerical examples and user studies in two simulation games. Additionally, we implemented and tested the proposed method on a real-world Franka robot arm performing mobile water-pouring tasks. The results demonstrate the efficacy and efficiency of our method, showing that it enables a robot to successfully learn safety constraints with a small handful (tens) of human directional corrections.
翻译:在安全关键型机器人规划或控制中,手动指定安全约束或从演示中学习这些约束可能具有挑战性。本文提出了一种可验证的对齐方法,使机器人能够通过人类在线方向性反馈,在其模型预测控制(MPC)策略中学习安全约束。据我们所知,这是首个从人类反馈中学习安全约束的方法。所提方法基于一个经验观察:当可获得人类方向性反馈时,该反馈倾向于引导机器人朝向更安全的区域。该方法仅需人类反馈的方向来更新学习假设空间。它是可验证的,能在成功学习的情况下提供人类反馈总数量的上界,或在假设空间设定错误(即无法在指定的假设空间内找到真实的隐式安全约束)时予以声明。我们通过数值算例和两个模拟游戏中的用户研究评估了所提方法。此外,我们在执行移动倒水任务的真实Franka机械臂上实现并测试了该方法。结果证明了我们方法的有效性和效率,表明其能使机器人通过少量(数十次)人类方向性修正成功学习安全约束。