The emergence of reinforcement learning (RL) methods in traffic signal control tasks has achieved better performance than conventional rule-based approaches. Most RL approaches require the observation of the environment for the agent to decide which action is optimal for a long-term reward. However, in real-world urban scenarios, missing observation of traffic states may frequently occur due to the lack of sensors, which makes existing RL methods inapplicable on road networks with missing observation. In this work, we aim to control the traffic signals in a real-world setting, where some of the intersections in the road network are not installed with sensors and thus with no direct observations around them. To the best of our knowledge, we are the first to use RL methods to tackle the traffic signal control problem in this real-world setting. Specifically, we propose two solutions: the first one imputes the traffic states to enable adaptive control, and the second one imputes both states and rewards to enable adaptive control and the training of RL agents. Through extensive experiments on both synthetic and real-world road network traffic, we reveal that our method outperforms conventional approaches and performs consistently with different missing rates. We also provide further investigations on how missing data influences the performance of our model.
翻译:在交通信号控制任务中,强化学习方法的出现已取得了优于传统基于规则方法的性能。大多数强化学习方法要求智能体观测环境状态,以决策何种行动能带来长期回报。然而,在真实城市场景中,由于传感器缺失,交通状态观测值可能频繁缺失,这使得现有强化学习方法不适用于存在观测缺失的道路网络。本文旨在控制真实环境中的交通信号,其中道路网络的部分交叉口未安装传感器,因此缺乏直接观测数据。据我们所知,这是首次采用强化学习方法解决该真实场景下的交通信号控制问题。具体而言,我们提出两种解决方案:第一种方案对交通状态进行插补以实现自适应控制;第二种方案同时对状态和回报进行插补,以便实现自适应控制并训练强化学习智能体。通过在合成与真实道路网络交通上的大量实验,我们揭示了所提方法优于传统方法,并在不同缺失率下表现一致。此外,我们还进一步研究了缺失数据对模型性能的影响机制。