The optimization of traffic signal control (TSC) is critical for an efficient transportation system. In recent years, reinforcement learning (RL) techniques have emerged as a popular approach for TSC and show promising results for highly adaptive control. However, existing RL-based methods suffer from notably poor real-world applicability and hardly have any successful deployments. The reasons for such failures are mostly due to the reliance on over-idealized traffic simulators for policy optimization, as well as using unrealistic fine-grained state observations and reward signals that are not directly obtainable from real-world sensors. In this paper, we propose a fully Data-Driven and simulator-free framework for realistic Traffic Signal Control (D2TSC). Specifically, we combine well-established traffic flow theory with machine learning to construct a reward inference model to infer the reward signals from coarse-grained traffic data. With the inferred rewards, we further propose a sample-efficient offline RL method to enable direct signal control policy learning from historical offline datasets of real-world intersections. To evaluate our approach, we collect historical traffic data from a real-world intersection, and develop a highly customized simulation environment that strictly follows real data characteristics. We demonstrate through extensive experiments that our approach achieves superior performance over conventional and offline RL baselines, and also enjoys much better real-world applicability.
翻译:交通信号控制(TSC)的优化对高效交通系统至关重要。近年来,强化学习(RL)技术已成为TSC的主流方法,并在高度自适应控制方面展现出良好前景。然而,现有基于RL的方法存在实际应用可行性显著不足的问题,鲜有成功部署案例。此类失败主要源于:依赖过度理想化的交通模拟器进行策略优化,以及采用难以从真实传感器直接获取的非现实细粒度状态观测与奖励信号。本文提出一种完全基于数据驱动且无需模拟器的现实交通信号控制框架(D2TSC)。具体而言,我们将成熟的交通流理论与机器学习相结合,构建奖励推断模型,从粗粒度交通数据中推断奖励信号。基于推断得到的奖励,我们进一步提出一种样本高效的离线RL方法,使其能够直接从真实交叉口的离线历史数据中学习信号控制策略。为评估该方法,我们采集真实交叉口的历史交通数据,并开发严格遵循真实数据特征的高度定制化仿真环境。大量实验证明,本方法在性能上优于传统方法与离线RL基线,且具有显著更强的现实应用可行性。