Autonomous driving is an emerging technology that has advanced rapidly over the last decade. Modern transportation is expected to benefit greatly from a wise decision-making framework of autonomous vehicles, including the improvement of mobility and the minimization of risks and travel time. However, existing methods either ignore the complexity of environments only fitting straight roads, or ignore the impact on surrounding vehicles during optimization phases, leading to weak environmental adaptability and incomplete optimization objectives. To address these limitations, we propose a parameterized decision-making framework with multi-modal perception based on deep reinforcement learning, called AUTO. We conduct a comprehensive perception to capture the state features of various traffic participants around the autonomous vehicle, based on which we design a graph-based model to learn a state representation of the multi-modal semantic features. To distinguish between lane-following and lane-changing, we decompose an action of the autonomous vehicle into a parameterized action structure that first decides whether to change lanes and then computes an exact action to execute. A hybrid reward function takes into account aspects of safety, traffic efficiency, passenger comfort, and impact to guide the framework to generate optimal actions. In addition, we design a regularization term and a multi-worker paradigm to enhance the training. Extensive experiments offer evidence that AUTO can advance state-of-the-art in terms of both macroscopic and microscopic effectiveness.
翻译:自动驾驶是一项新兴技术,近十年来发展迅速。现代交通有望从自动驾驶车辆智能决策框架中获益良多,包括提升机动性、降低风险及缩短行程时间。然而现有方法要么忽略环境复杂性仅适用于直线道路,要么在优化阶段忽略对周围车辆的影响,导致环境适应性不足且优化目标不完整。为解决这些局限,我们提出一种基于深度强化学习的参数化多模态感知决策框架AUTO。通过全面感知捕获自动驾驶车辆周围各类交通参与者的状态特征,我们设计了基于图的模型来学习多模态语义特征的状态表征。为区分车道保持与车道变换,我们将自动驾驶车辆的动作分解为参数化动作结构:先决策是否变道,再计算具体执行动作。混合奖励函数综合考虑安全性、交通效率、乘客舒适度及影响因子,引导框架生成最优动作。此外,我们设计了正则化项与多工作线程模式增强训练效果。大量实验表明,AUTO在宏观与微观有效性方面均达到当前最优水平。