Advanced vehicle control is a fundamental building block in the development of autonomous driving systems. Reinforcement learning (RL) promises to achieve control performance superior to classical approaches while keeping computational demands low during deployment. However, standard RL approaches like soft-actor critic (SAC) require extensive amounts of training data to be collected and are thus impractical for real-world application. To address this issue, we apply recently developed data-efficient deep RL methods to vehicle trajectory control. Our investigation focuses on three methods, so far unexplored for vehicle control: randomized ensemble double Q-learning (REDQ), probabilistic ensembles with trajectory sampling and model predictive path integral optimizer (PETS-MPPI), and model-based policy optimization (MBPO). We find that in the case of trajectory control, the standard model-based RL formulation used in approaches like PETS-MPPI and MBPO is not suitable. We, therefore, propose a new formulation that splits dynamics prediction and vehicle localization. Our benchmark study on the CARLA simulator reveals that the three identified data-efficient deep RL approaches learn control strategies on a par with or better than SAC, yet reduce the required number of environment interactions by more than one order of magnitude.
翻译:先进的车辆控制是自动驾驶系统发展的基础模块。强化学习有望在保持低部署计算需求的同时,实现优于经典方法的控制性能。然而,标准强化学习方法(如软演员-评论家)需要收集大量训练数据,因此在实际应用中不切实际。为解决这一问题,我们将近期发展的数据高效深度强化学习方法应用于车辆轨迹控制。本研究聚焦于三种此前未在车辆控制中探索的方法:随机集成双Q学习、基于轨迹采样的概率集成与模型预测路径积分优化器,以及基于模型的策略优化。我们发现,在轨迹控制场景中,PETS-MPPI和MBPO等基于模型的标准强化学习公式并不适用。为此,我们提出了一种将动力学预测与车辆定位分离的新公式。在CARLA模拟器上的基准研究表明,这三种数据高效深度强化学习方法能够学到与SAC相当或更优的控制策略,同时所需的环境交互次数减少了一个数量级以上。