Developing and testing automated driving models in the real world might be challenging and even dangerous, while simulation can help with this, especially for challenging maneuvers. Deep reinforcement learning (DRL) has the potential to tackle complex decision-making and controlling tasks through learning and interacting with the environment, thus it is suitable for developing automated driving while not being explored in detail yet. This study carried out a comprehensive study by implementing, evaluating, and comparing the two DRL algorithms, Deep Q-networks (DQN) and Trust Region Policy Optimization (TRPO), for training automated driving on the highway-env simulation platform. Effective and customized reward functions were developed and the implemented algorithms were evaluated in terms of onlane accuracy (how well the car drives on the road within the lane), efficiency (how fast the car drives), safety (how likely the car is to crash into obstacles), and comfort (how much the car makes jerks, e.g., suddenly accelerates or brakes). Results show that the TRPO-based models with modified reward functions delivered the best performance in most cases. Furthermore, to train a uniform driving model that can tackle various driving maneuvers besides the specific ones, this study expanded the highway-env and developed an extra customized training environment, namely, ComplexRoads, integrating various driving maneuvers and multiple road scenarios together. Models trained on the designed ComplexRoads environment can adapt well to other driving maneuvers with promising overall performance. Lastly, several functionalities were added to the highway-env to implement this work. The codes are open on GitHub at https://github.com/alaineman/drlcarsim-paper.
翻译:在真实世界中开发和测试自动驾驶模型可能面临挑战甚至危险,而仿真环境尤其有助于处理复杂驾驶场景。深度强化学习通过与环境的交互学习,能够应对复杂的决策与控制任务,因此适用于自动驾驶开发,但目前尚未得到深入探索。本研究通过实现、评估并比较深度Q网络与信任域策略优化两种深度强化学习算法,在highway-env仿真平台上开展自动驾驶训练的全面研究。我们开发了高效且定制化的奖励函数,从车道保持精度(车辆在车道内行驶的准确性)、效率(行驶速度)、安全性(碰撞障碍物的风险)和舒适性(车辆产生急动——如突然加速或制动——的程度)四个维度评估所实现算法。结果表明,基于信任域策略优化且采用改进奖励函数的模型在多数场景下表现最优。此外,为训练能处理特定驾驶场景外的多种驾驶模式的通用驾驶模型,本研究扩展了highway-env平台并开发了名为ComplexRoads的定制化训练环境,该环境融合了多种驾驶操作与多道路场景。在ComplexRoads环境上训练的模型能够良好适应其他驾驶操作并保持优异整体性能。最后,我们为highway-env平台添加了多项功能以支撑本工作。相关代码已在GitHub开源:https://github.com/alaineman/drlcarsim-paper。