Applying reinforcement learning (RL) to traffic signal control (TSC) has become a promising solution. However, most RL-based methods focus solely on optimization within simulators and give little thought to deployment issues in the real world. Online RL-based methods, which require interaction with the environment, are limited in their interactions with the real-world environment. Additionally, acquiring an offline dataset for offline RL is challenging in the real world. Moreover, most real-world intersections prefer a cyclical phase structure. To address these challenges, we propose: (1) a cyclical offline dataset (COD), designed based on common real-world scenarios to facilitate easy collection; (2) an offline RL model called DataLight, capable of learning satisfactory control strategies from the COD; and (3) a method called Arbitrary To Cyclical (ATC), which can transform most RL-based methods into cyclical signal control. Extensive experiments using real-world datasets on simulators demonstrate that: (1) DataLight outperforms most existing methods and achieves comparable results with the best-performing method; (2) introducing ATC into some recent RL-based methods achieves satisfactory performance; and (3) COD is reliable, with DataLight remaining robust even with a small amount of data. These results suggest that the cyclical offline dataset might be enough for offline RL for TSC. Our proposed methods make significant contributions to the TSC field and successfully bridge the gap between simulation experiments and real-world applications. Our code is released on Github.
翻译:将强化学习(RL)应用于交通信号控制(TSC)已成为一种有前景的解决方案。然而,多数基于RL的方法仅专注于模拟器内的优化,鲜少考虑真实世界的部署问题。基于在线RL的方法需要与环境交互,其在真实环境中的交互能力受限。此外,在真实世界中获取离线RL所需的离线数据集具有挑战性。同时,多数真实交叉路口倾向于采用周期性相位结构。为应对这些挑战,我们提出:(1)基于常见真实场景设计的周期性离线数据集(COD),便于采集;(2)名为DataLight的离线RL模型,能够从COD中学习到令人满意的控制策略;(3)名为任意转周期(ATC)的方法,可将多数基于RL的方法转化为周期性信号控制。基于真实数据集在模拟器上的大量实验表明:(1)DataLight优于大多数现有方法,并取得了与最佳方法相媲美的结果;(2)将ATC引入近期一些基于RL的方法后,获得了令人满意的性能;(3)COD具有可靠性,即便数据量较小,DataLight依然保持稳健。这些结果表明,周期性离线数据集可能足以支撑TSC的离线RL方法。我们提出的方法为TSC领域做出了重要贡献,并成功弥合了仿真实验与真实应用之间的鸿沟。相关代码已发布在Github上。