Developing and testing automated driving models in the real world might be challenging and even dangerous, while simulation can help with this, especially for challenging maneuvers. Deep reinforcement learning (DRL) has the potential to tackle complex decision-making and controlling tasks through learning and interacting with the environment, thus it is suitable for developing automated driving while not being explored in detail yet. This study carried out a comprehensive study by implementing, evaluating, and comparing the two DRL algorithms, Deep Q-networks (DQN) and Trust Region Policy Optimization (TRPO), for training automated driving on the highway-env simulation platform. Effective and customized reward functions were developed and the implemented algorithms were evaluated in terms of onlane accuracy (how well the car drives on the road within the lane), efficiency (how fast the car drives), safety (how likely the car is to crash into obstacles), and comfort (how much the car makes jerks, e.g., suddenly accelerates or brakes). Results show that the TRPO-based models with modified reward functions delivered the best performance in most cases. Furthermore, to train a uniform driving model that can tackle various driving maneuvers besides the specific ones, this study expanded the highway-env and developed an extra customized training environment, namely, ComplexRoads, integrating various driving maneuvers and multiple road scenarios together. Models trained on the designed ComplexRoads environment can adapt well to other driving maneuvers with promising overall performance. Lastly, several functionalities were added to the highway-env to implement this work. The codes are open on GitHub at https://github.com/alaineman/drlcarsim.
翻译:在真实环境中开发和测试自动驾驶模型可能具有挑战性甚至危险性,而仿真技术能够为此提供帮助,尤其对于复杂驾驶操作。深度强化学习通过与环境交互和学习,具备处理复杂决策与控制任务的潜力,因此适用于自动驾驶开发,但该领域尚缺乏深入探索。本研究通过实施、评估和比较两种深度强化学习算法——深度Q网络(DQN)与信任域策略优化(TRPO),在highway-env仿真平台上开展自动驾驶训练的综合性研究。我们设计了高效且定制化的奖励函数,并从车道保持精度(车辆在车道内行驶的准确性)、行驶效率(车辆速度表现)、安全性(车辆碰撞风险)及舒适性(车辆急加速或急刹车等顿挫程度)四个维度评估算法性能。结果表明,采用改进奖励函数的TRPO模型在多数场景中表现最佳。此外,为训练能应对多种驾驶操作(而非特定操作)的通用驾驶模型,本研究扩展了highway-env平台,开发了名为ComplexRoads的定制化训练环境,该环境整合了多种驾驶操作与多道路场景。在ComplexRoads环境中训练的模型能良好适应其他驾驶操作,且整体性能优异。最后,本研究为highway-env添加了多项功能以实现相关工作。代码已在GitHub开源:https://github.com/alaineman/drlcarsim。