Deep reinforcement learning (DRL) allows a system to interact with its environment and take actions by training an efficient policy that maximizes self-defined rewards. In autonomous driving, it can be used as a strategy for high-level decision making, whereas low-level algorithms such as the hybrid A* path planning have proven their ability to solve the local trajectory planning problem. In this work, we combine these two methods where the DRL makes high-level decisions such as lane change commands. After obtaining the lane change command, the hybrid A* planner is able to generate a collision-free trajectory to be executed by a model predictive controller (MPC). In addition, the DRL algorithm is able to keep the lane change command consistent within a chosen time-period. Traffic rules are implemented using linear temporal logic (LTL), which is then utilized as a reward function in DRL. Furthermore, we validate the proposed method on a real system to demonstrate its feasibility from simulation to implementation on real hardware.
翻译:深度强化学习(DRL)通过训练一个能最大化自定义奖励的高效策略,使系统能够与环境交互并采取行动。在自动驾驶中,它可作为高层决策的策略,而混合A*路径规划等底层算法已被证明能有效解决局部轨迹规划问题。本研究将这两种方法相结合:DRL负责高层决策(如变道指令),获取变道指令后,混合A*规划器可生成无碰撞轨迹,并由模型预测控制器(MPC)执行。此外,DRL算法能在选定时间段内保持变道指令的一致性。交通规则通过线性时序逻辑(LTL)实现,并作为DRL的奖励函数。最后,我们在真实系统上验证了所提方法,证明了其从仿真到实际硬件部署的可行性。