Ensuring robot safety can be challenging; user-defined constraints can miss edge cases, policies can become unsafe even when trained from safe data, and safety can be subjective. Thus, we learn about robot safety by showing policy trajectories to a human who flags unsafe behavior. From this binary feedback, we use the statistical method of conformal prediction to identify a region of states, potentially in learned latent space, guaranteed to contain a user-specified fraction of future policy errors. Our method is sample-efficient, as it builds on nearest neighbor classification and avoids withholding data as is common with conformal prediction. By alerting if the robot reaches the suspected unsafe region, we obtain a warning system that mimics the human's safety preferences with guaranteed miss rate. From video labeling, our system can detect when a quadcopter visuomotor policy will fail to steer through a designated gate. We present an approach for policy improvement by avoiding the suspected unsafe region. With it we improve a model predictive controller's safety, as shown in experimental testing with 30 quadcopter flights across 6 navigation tasks. Code and videos are provided.
翻译:确保机器人安全可能具有挑战性:用户定义的约束可能遗漏边缘情况,策略即使从安全数据中训练也可能变得不安全,且安全性可能具有主观性。因此,我们通过向标记不安全行为的人类展示策略轨迹来学习机器人安全。基于这种二元反馈,我们采用共形预测的统计方法,识别一个状态区域(可能在学习到的潜在空间中),该区域保证包含未来策略错误中用户指定比例的部分。我们的方法具有样本高效性,因为它基于最近邻分类,并避免了共形预测中常见的保留数据做法。通过机器人进入可疑不安全区域时发出警报,我们获得了一个模拟人类安全偏好且保证漏报率的预警系统。通过视频标注,我们的系统能够检测四旋翼视觉运动策略何时将无法导航通过指定通道。我们提出了一种通过避开可疑不安全区域来改进策略的方法。利用该方法,我们提升了模型预测控制器的安全性——这在包含30次四旋翼飞行、横跨6个导航任务的实验测试中得到了验证。代码与视频已提供。