Traffic scenarios in roundabouts pose substantial complexity for automated driving. Manually mapping all possible scenarios into a state space is labor-intensive and challenging. Deep reinforcement learning (DRL) with its ability to learn from interacting with the environment emerges as a promising solution for training such automated driving models. This study explores, employs, and implements various DRL algorithms, namely Deep Deterministic Policy Gradient (DDPG), Proximal Policy Optimization (PPO), and Trust Region Policy Optimization (TRPO) to instruct automated vehicles' driving through roundabouts. The driving state space, action space, and reward function are designed. The reward function considers safety, efficiency, comfort, and energy consumption to align with real-world requirements. All three tested DRL algorithms succeed in enabling automated vehicles to drive through the roundabout. To holistically evaluate the performance of these algorithms, this study establishes an evaluation methodology considering multiple indicators such as safety, efficiency, and comfort level. A method employing the Analytic Hierarchy Process is also developed to weigh these evaluation indicators. Experimental results on various testing scenarios reveal that the TRPO algorithm outperforms DDPG and PPO in terms of safety and efficiency, and PPO performs best in terms of comfort level. Lastly, to verify the model's adaptability and robustness regarding other driving scenarios, this study also deploys the model trained by TRPO to a range of different testing scenarios, e.g., highway driving and merging. Experimental results demonstrate that the TRPO model trained on only roundabout driving scenarios exhibits a certain degree of proficiency in highway driving and merging scenarios. This study provides a foundation for the application of automated driving with DRL in real traffic environments.
翻译:环形交叉口的交通场景对自动驾驶构成显著复杂性。手动将所有可能场景映射至状态空间既耗费人力且具挑战性。深度强化学习因其与环境交互学习的能力,成为训练此类自动驾驶模型的前沿解决方案。本研究探索、采用并实现了多种深度强化学习算法,即深度确定性策略梯度(DDPG)、近端策略优化(PPO)及信任域策略优化(TRPO),用于指导自动驾驶车辆通过环形交叉口。设计了驾驶状态空间、动作空间及奖励函数。奖励函数综合考虑安全性、效率、舒适性及能耗,以契合实际需求。三种测试的深度强化学习算法均成功使自动驾驶车辆通过环形交叉口。为全面评估这些算法性能,本研究建立了涵盖安全性、效率及舒适性等多指标的评估体系,并采用层次分析法对评估指标进行权重赋权。多场景实验结果表明,TRPO算法在安全性与效率方面优于DDPG和PPO,而PPO在舒适性方面表现最优。最后,为验证模型对其他驾驶场景的适应性与鲁棒性,本研究将TRPO训练模型部署至不同测试场景(如高速公路行驶与合流)。实验结果表明,仅基于环形交叉口场景训练的TRPO模型在高速公路行驶与合流场景中展现出一定熟练度。本研究为基于深度强化学习的自动驾驶在真实交通环境中的应用奠定了基础。