The interest in using reinforcement learning (RL) controllers in safety-critical applications such as robot navigation around pedestrians motivates the development of additional safety mechanisms. Running RL-enabled systems among uncertain dynamic agents may result in high counts of collisions and failures to reach the goal. The system could be safer if the pre-trained RL policy was uncertainty-informed. For that reason, we propose conformal predictive safety filters that: 1) predict the other agents' trajectories, 2) use statistical techniques to provide uncertainty intervals around these predictions, and 3) learn an additional safety filter that closely follows the RL controller but avoids the uncertainty intervals. We use conformal prediction to learn uncertainty-informed predictive safety filters, which make no assumptions about the agents' distribution. The framework is modular and outperforms the existing controllers in simulation. We demonstrate our approach with multiple experiments in a collision avoidance gym environment and show that our approach minimizes the number of collisions without making overly-conservative predictions.
翻译:在安全关键应用(如行人周围机器人导航)中使用强化学习控制器的兴趣,推动了额外安全机制的发展。在不确定的动态智能体环境中运行强化学习系统可能导致碰撞次数过多且无法达到目标。若预训练的强化学习策略具有不确定性感知能力,系统将更加安全。为此,我们提出了共形预测安全过滤器,该过滤器:1)预测其他智能体的轨迹,2)利用统计技术为这些预测提供不确定性区间,3)学习一个紧密跟随强化学习控制器但避开不确定性区间的额外安全过滤器。我们采用共形预测来学习不确定性感知的预测安全过滤器,该方法不假设智能体的分布特性。该框架具有模块化特性,在仿真中优于现有控制器。我们在碰撞规避的OpenAI Gym环境中通过多项实验验证了该方法,结果表明,我们的方法在避免过度保守预测的同时,有效最小化了碰撞次数。