The application of reinforcement learning in traffic signal control (TSC) has been extensively researched and yielded notable achievements. However, most existing works for TSC assume that traffic data from all surrounding intersections is fully and continuously available through sensors. In real-world applications, this assumption often fails due to sensor malfunctions or data loss, making TSC with missing data a critical challenge. To meet the needs of practical applications, we introduce DiffLight, a novel conditional diffusion model for TSC under data-missing scenarios in the offline setting. Specifically, we integrate two essential sub-tasks, i.e., traffic data imputation and decision-making, by leveraging a Partial Rewards Conditioned Diffusion (PRCD) model to prevent missing rewards from interfering with the learning process. Meanwhile, to effectively capture the spatial-temporal dependencies among intersections, we design a Spatial-Temporal transFormer (STFormer) architecture. In addition, we propose a Diffusion Communication Mechanism (DCM) to promote better communication and control performance under data-missing scenarios. Extensive experiments on five datasets with various data-missing scenarios demonstrate that DiffLight is an effective controller to address TSC with missing data. The code of DiffLight is released at https://github.com/lokol5579/DiffLight-release.
翻译:强化学习在交通信号控制(TSC)中的应用已被广泛研究并取得了显著成果。然而,现有的大多数TSC研究工作都假设来自所有周边交叉路口的交通数据能够通过传感器完整且连续地获取。在实际应用中,由于传感器故障或数据丢失,这一假设常常无法成立,使得缺失数据下的TSC成为一个关键挑战。为满足实际应用需求,我们提出了DiffLight,一种用于离线设置下数据缺失场景TSC的新型条件扩散模型。具体而言,我们通过利用部分奖励条件扩散(PRCD)模型来防止缺失的奖励干扰学习过程,从而整合了两个关键子任务,即交通数据插补与决策制定。同时,为了有效捕捉交叉路口间的时空依赖性,我们设计了一种时空变换器(STFormer)架构。此外,我们提出了一种扩散通信机制(DCM),以在数据缺失场景下促进更好的通信与控制性能。在五种数据集及多种数据缺失场景下的大量实验表明,DiffLight是解决缺失数据TSC问题的有效控制器。DiffLight的代码发布于 https://github.com/lokol5579/DiffLight-release。