Reinforcement learning has received high research interest for developing planning approaches in automated driving. Most prior works consider the end-to-end planning task that yields direct control commands and rarely deploy their algorithm to real vehicles. In this work, we propose a method to employ a trained deep reinforcement learning policy for dedicated high-level behavior planning. By populating an abstract objective interface, established motion planning algorithms can be leveraged, which derive smooth and drivable trajectories. Given the current environment model, we propose to use a built-in simulator to predict the traffic scene for a given horizon into the future. The behavior of automated vehicles in mixed traffic is determined by querying the learned policy. To the best of our knowledge, this work is the first to apply deep reinforcement learning in this manner, and as such lacks a state-of-the-art benchmark. Thus, we validate the proposed approach by comparing an idealistic single-shot plan with cyclic replanning through the learned policy. Experiments with a real testing vehicle on proving grounds demonstrate the potential of our approach to shrink the simulation to real world gap of deep reinforcement learning based planning approaches. Additional simulative analyses reveal that more complex multi-agent maneuvers can be managed by employing the cycling replanning approach.
翻译:强化学习在自动驾驶规划方法研究中获得了高度关注。以往大多数工作考虑端到端规划任务,直接生成控制指令,且很少将算法部署到真实车辆上。本文提出一种方法,利用训练好的深度强化学习策略进行专用高级行为规划。通过填充抽象目标接口,可借助成熟的运动规划算法来推导平滑且可行驶的轨迹。基于当前环境模型,我们提出使用内置模拟器预测未来给定时间范围内的交通场景。混合交通中自动驾驶车辆的行为通过查询学习到的策略来确定。据我们所知,本研究是首次以这种方式应用深度强化学习,因此缺乏当前最先进的基准对照。我们通过比较理想化的单次规划与基于学习策略的循环重规划来验证所提方法。在真实测试场地上使用实车进行的实验表明,我们的方法具有缩小深度强化学习规划方法中仿真与真实世界差距的潜力。额外的仿真分析表明,采用循环重规划方法可处理更复杂的多智能体机动。