Deep reinforcement learning (DRL) is a promising method to learn control policies for robots only from demonstration and experience. To cover the whole dynamic behaviour of the robot, DRL training is an active exploration process typically performed in simulation environments. Although this simulation training is cheap and fast, applying DRL algorithms to real-world settings is difficult. If agents are trained until they perform safely in simulation, transferring them to physical systems is difficult due to the sim-to-real gap caused by the difference between the simulation dynamics and the physical robot. In this paper, we present a method of online training a DRL agent to drive autonomously on a physical vehicle by using a model-based safety supervisor. Our solution uses a supervisory system to check if the action selected by the agent is safe or unsafe and ensure that a safe action is always implemented on the vehicle. With this, we can bypass the sim-to-real problem while training the DRL algorithm safely, quickly, and efficiently. We compare our method with conventional learning in simulation and on a physical vehicle. We provide a variety of real-world experiments where we train online a small-scale vehicle to drive autonomously with no prior simulation training. The evaluation results show that our method trains agents with improved sample efficiency while never crashing, and the trained agents demonstrate better driving performance than those trained in simulation.
翻译:深度强化学习(DRL)是一种仅通过演示和经验即可学习机器人控制策略的 promising 方法。为覆盖机器人的完整动态行为,DRL训练通常采用主动探索过程,在仿真环境中进行。尽管这种仿真训练成本低且速度快,但将DRL算法应用于真实场景却面临困难。若智能体在仿真中训练至安全运行,由于仿真动力学与实际机器人之间的差异导致的“仿真到现实鸿沟”(sim-to-real gap),将其迁移到物理系统十分困难。本文提出一种方法,通过基于模型的安全监督器,在实体车辆上在线训练DRL智能体实现自主驾驶。我们的解决方案采用监督系统检查智能体所选动作的安全性,确保车辆始终执行安全动作。借此,我们能安全、快速且高效地绕过仿真到现实问题来训练DRL算法。我们将该方法与传统的仿真训练及实体车辆训练进行对比,并开展了大量真实世界实验——未经历仿真训练即可在线训练小型车辆实现自主驾驶。评估结果表明,本方法训练的智能体在不发生碰撞情况下具有更高的样本效率,且驾驶性能优于仿真训练的智能体。