The interest in using reinforcement learning (RL) controllers in safety-critical applications such as robot navigation around pedestrians motivates the development of additional safety mechanisms. Running RL-enabled systems among uncertain dynamic agents may result in high counts of collisions and failures to reach the goal. The system could be safer if the pre-trained RL policy was uncertainty-informed. For that reason, we propose conformal predictive safety filters that: 1) predict the other agents' trajectories, 2) use statistical techniques to provide uncertainty intervals around these predictions, and 3) learn an additional safety filter that closely follows the RL controller but avoids the uncertainty intervals. We use conformal prediction to learn uncertainty-informed predictive safety filters, which make no assumptions about the agents' distribution. The framework is modular and outperforms the existing controllers in simulation. We demonstrate our approach with multiple experiments in a collision avoidance gym environment and show that our approach minimizes the number of collisions without making overly-conservative predictions.
翻译:在安全关键型应用(如行人周围机器人导航)中使用强化学习控制器的兴趣推动了附加安全机制的发展。在不确定动态智能体环境中运行基于强化学习的系统可能导致大量碰撞和无法到达目标。若预训练强化学习策略能感知不确定性,系统安全性可得到提升。为此,我们提出共形预测安全滤波器,其具备以下功能:1) 预测其他智能体的轨迹,2) 利用统计技术为这些预测提供不确定性区间,3) 学习一个紧密跟随强化学习控制器但避开不确定性区间的附加安全滤波器。我们采用共形预测方法学习不确定性感知的预测性安全滤波器,该方法不对智能体分布做任何假设。该框架具有模块化特性,并在仿真中优于现有控制器。我们在避碰仿真环境中通过多项实验验证了该方法,结果表明我们的方法能在不做出过度保守预测的前提下最小化碰撞次数。