Traffic Shaping and Hysteresis Mitigation Using Deep Reinforcement Learning in a Connected Driving Environment

A multi-agent deep reinforcement learning-based framework for traffic shaping. The proposed framework offers a key advantage over existing congestion management strategies which is the ability to mitigate hysteresis phenomena. Unlike existing congestion management strategies that focus on breakdown prevention, the proposed framework is extremely effective after breakdown formation. The proposed framework assumes partial connectivity between the automated vehicles which share information. The framework requires a basic level of autonomy defined by one-dimensional longitudinal control. This framework is primarily built using a centralized training, centralized execution multi-agent deep reinforcement learning approach, where longitudinal control is defined by signals of acceleration or deceleration commands which are then executed by all agents uniformly. The model undertaken for training and testing of the framework is based on the well-known Double Deep Q-Learning algorithm which takes the average state of flow within the traffic stream as the model input and outputs actions in the form of acceleration or deceleration values. We demonstrate the ability of the model to shape the state of traffic, mitigate the negative effects of hysteresis, and even improve traffic flow beyond its original level. This paper also identifies the minimum percentage of CAVs required to successfully shape the traffic under an assumption of uniformly distributed CAVs within the loop system. The framework illustrated in this work doesnt just show the theoretical applicability of reinforcement learning to tackle such challenges but also proposes a realistic solution that only requires partial connectivity and continuous monitoring of the average speed of the system, which can be achieved using readily available sensors that measure the speeds of vehicles in reasonable proximity to the CAVs.

翻译：一种基于多智能体深度强化学习的交通流量整形框架。该框架相比现有拥堵管理策略的关键优势在于能够缓解滞回现象。与现有专注于预防交通崩溃的拥堵管理策略不同，本框架在交通崩溃形成后仍能发挥显著效用。该框架假设自动驾驶车辆间具有部分连通性以实现信息共享，且仅需基础级自主能力（一维纵向控制）。框架核心采用集中式训练、集中式执行的多智能体深度强化学习方法，通过统一的加/减速指令信号定义纵向控制行为。模型训练与测试基于经典的双深度Q学习算法，该算法以交通流内平均状态为输入，输出加/减速度值形式的控制动作。实验证明该模型能重塑交通状态、缓解滞回负面影响，甚至使交通流质量超越原始水平。本研究还确定了在环状系统中均匀分布网联自动驾驶车辆（CAV）的前提下，成功实现交通整形所需的最小CAV渗透率。本文所述框架不仅验证了强化学习解决此类问题的理论可行性，更提出了仅需部分连通性与系统平均速度连续监测的实用化方案——该速度数据可通过近距离测量车辆速度的已有传感器获取。