Traffic Signal Control (TSC) aims to reduce the average travel time of vehicles in a road network, which in turn enhances fuel utilization efficiency, air quality, and road safety, benefiting society as a whole. Due to the complexity of long-horizon control and coordination, most prior TSC methods leverage deep reinforcement learning (RL) to search for a control policy and have witnessed great success. However, TSC still faces two significant challenges. 1) The travel time of a vehicle is delayed feedback on the effectiveness of TSC policy at each traffic intersection since it is obtained after the vehicle has left the road network. Although several heuristic reward functions have been proposed as substitutes for travel time, they are usually biased and not leading the policy to improve in the correct direction. 2) The traffic condition of each intersection is influenced by the non-local intersections since vehicles traverse multiple intersections over time. Therefore, the TSC agent is required to leverage both the local observation and the non-local traffic conditions to predict the long-horizontal traffic conditions of each intersection comprehensively. To address these challenges, we propose DenseLight, a novel RL-based TSC method that employs an unbiased reward function to provide dense feedback on policy effectiveness and a non-local enhanced TSC agent to better predict future traffic conditions for more precise traffic control. Extensive experiments and ablation studies demonstrate that DenseLight can consistently outperform advanced baselines on various road networks with diverse traffic flows. The code is available at https://github.com/junfanlin/DenseLight.
翻译:交通信号控制(TSC)旨在降低道路网络中车辆的平均行驶时间,从而提升燃料利用效率、改善空气质量及道路安全性,使整个社会受益。由于长时域控制与协调的复杂性,以往大多数TSC方法借助深度强化学习(RL)搜索控制策略,并取得了显著成功。然而,TSC仍面临两大挑战:1) 车辆行驶时间作为TSC策略在每个交通路口有效性的延迟反馈,仅在车辆驶离道路网络后才能获取。尽管已提出多种启发式奖励函数作为行驶时间的替代,但这些函数通常存在偏差,无法引导策略向正确方向优化。2) 随着车辆随时间穿过多个路口,各个路口的交通状况受到非局部路口的影响。因此,TSC代理需同时利用局部观测与非局部交通条件,全面预测各路口的长时域交通状况。为应对这些挑战,我们提出DenseLight——一种基于RL的TSC新方法,通过无偏奖励函数提供策略有效性的密集反馈,并采用非局部增强型TSC代理更精准地预测未来交通状况,从而实现更精确的交通控制。大量实验与消融研究表明,DenseLight能够在多种交通流量的不同道路网络上持续超越先进基线方法。代码已开源至https://github.com/junfanlin/DenseLight。