Comprehensive Training and Evaluation on Deep Reinforcement Learning for Automated Driving in Various Simulated Driving Maneuvers

Developing and testing automated driving models in the real world might be challenging and even dangerous, while simulation can help with this, especially for challenging maneuvers. Deep reinforcement learning (DRL) has the potential to tackle complex decision-making and controlling tasks through learning and interacting with the environment, thus it is suitable for developing automated driving while not being explored in detail yet. This study carried out a comprehensive study by implementing, evaluating, and comparing the two DRL algorithms, Deep Q-networks (DQN) and Trust Region Policy Optimization (TRPO), for training automated driving on the highway-env simulation platform. Effective and customized reward functions were developed and the implemented algorithms were evaluated in terms of onlane accuracy (how well the car drives on the road within the lane), efficiency (how fast the car drives), safety (how likely the car is to crash into obstacles), and comfort (how much the car makes jerks, e.g., suddenly accelerates or brakes). Results show that the TRPO-based models with modified reward functions delivered the best performance in most cases. Furthermore, to train a uniform driving model that can tackle various driving maneuvers besides the specific ones, this study expanded the highway-env and developed an extra customized training environment, namely, ComplexRoads, integrating various driving maneuvers and multiple road scenarios together. Models trained on the designed ComplexRoads environment can adapt well to other driving maneuvers with promising overall performance. Lastly, several functionalities were added to the highway-env to implement this work. The codes are open on GitHub at https://github.com/alaineman/drlcarsim.

翻译：在真实环境中开发和测试自动驾驶模型可能具有挑战性甚至危险性，而仿真技术能够为此提供帮助，尤其对于复杂驾驶操作。深度强化学习通过与环境交互和学习，具备处理复杂决策与控制任务的潜力，因此适用于自动驾驶开发，但该领域尚缺乏深入探索。本研究通过实施、评估和比较两种深度强化学习算法——深度Q网络（DQN）与信任域策略优化（TRPO），在highway-env仿真平台上开展自动驾驶训练的综合性研究。我们设计了高效且定制化的奖励函数，并从车道保持精度（车辆在车道内行驶的准确性）、行驶效率（车辆速度表现）、安全性（车辆碰撞风险）及舒适性（车辆急加速或急刹车等顿挫程度）四个维度评估算法性能。结果表明，采用改进奖励函数的TRPO模型在多数场景中表现最佳。此外，为训练能应对多种驾驶操作（而非特定操作）的通用驾驶模型，本研究扩展了highway-env平台，开发了名为ComplexRoads的定制化训练环境，该环境整合了多种驾驶操作与多道路场景。在ComplexRoads环境中训练的模型能良好适应其他驾驶操作，且整体性能优异。最后，本研究为highway-env添加了多项功能以实现相关工作。代码已在GitHub开源：https://github.com/alaineman/drlcarsim。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日