The emergence of reinforcement learning (RL) methods in traffic signal control tasks has achieved better performance than conventional rule-based approaches. Most RL approaches require the observation of the environment for the agent to decide which action is optimal for a long-term reward. However, in real-world urban scenarios, missing observation of traffic states may frequently occur due to the lack of sensors, which makes existing RL methods inapplicable on road networks with missing observation. In this work, we aim to control the traffic signals in a real-world setting, where some of the intersections in the road network are not installed with sensors and thus with no direct observations around them. To the best of our knowledge, we are the first to use RL methods to tackle the traffic signal control problem in this real-world setting. Specifically, we propose two solutions: the first one imputes the traffic states to enable adaptive control, and the second one imputes both states and rewards to enable adaptive control and the training of RL agents. Through extensive experiments on both synthetic and real-world road network traffic, we reveal that our method outperforms conventional approaches and performs consistently with different missing rates. We also provide further investigations on how missing data influences the performance of our model.
翻译:强化学习(RL)方法在交通信号控制任务中的应用已取得优于传统规则方法的性能。多数RL方法需要智能体观测环境状态以确定能实现长期收益的最优动作。然而,在实际城市场景中,由于传感器缺失导致的交通状态观测数据缺失现象频发,使得现有RL方法无法适用于存在观测缺失的道路网络。本研究旨在解决真实场景中的交通信号控制问题——当路网中部分交叉口未安装传感器而无法获取直接观测数据时,我们率先采用RL方法应对这一现实挑战。具体而言,我们提出两种解决方案:第一种通过插补交通状态实现自适应控制,第二种则同步插补状态与奖励值,以实现自适应控制与RL智能体训练。基于合成数据与真实路网交通数据的充分实验表明,我们的方法不仅优于传统方案,且在不同缺失率下均能保持稳定的控制性能。我们还深入探究了数据缺失对模型性能的影响机制。