Reinforcement learning algorithms need exploration to learn. However, unsupervised exploration prevents the deployment of such algorithms on safety-critical tasks and limits real-world deployment. In this paper, we propose a new algorithm called Ensemble Model Predictive Safety Certification that combines model-based deep reinforcement learning with tube-based model predictive control to correct the actions taken by a learning agent, keeping safety constraint violations at a minimum through planning. Our approach aims to reduce the amount of prior knowledge about the actual system by requiring only offline data generated by a safe controller. Our results show that we can achieve significantly fewer constraint violations than comparable reinforcement learning methods.
翻译:强化学习算法需要通过探索来学习,然而无监督探索阻碍了这类算法在安全关键型任务中的部署,并限制了其实际应用。本文提出一种名为"集成模型预测安全认证"的新算法,该算法将基于模型的深度强化学习与基于管道模型预测控制相结合,通过学习智能体所采取的动作进行校正,通过规划将安全约束违反降至最低。我们的方法仅需安全控制器生成的离线数据,旨在降低对实际系统先验知识的依赖。结果表明,与同类强化学习方法相比,该方法能显著减少约束违反次数。