Efficient traffic signal control is critical for reducing traffic congestion and improving overall transportation efficiency. The dynamic nature of traffic flow has prompted researchers to explore Reinforcement Learning (RL) for traffic signal control (TSC). Compared with traditional methods, RL-based solutions have shown preferable performance. However, the application of RL-based traffic signal controllers in the real world is limited by the low sample efficiency and high computational requirements of these solutions. In this work, we propose DTLight, a simple yet powerful lightweight Decision Transformer-based TSC method that can learn policy from easily accessible offline datasets. DTLight novelly leverages knowledge distillation to learn a lightweight controller from a well-trained larger teacher model to reduce implementation computation. Additionally, it integrates adapter modules to mitigate the expenses associated with fine-tuning, which makes DTLight practical for online adaptation with minimal computation and only a few fine-tuning steps during real deployment. Moreover, DTLight is further enhanced to be more applicable to real-world TSC problems. Extensive experiments on synthetic and real-world scenarios show that DTLight pre-trained purely on offline datasets can outperform state-of-the-art online RL-based methods in most scenarios. Experiment results also show that online fine-tuning further improves the performance of DTLight by up to 42.6% over the best online RL baseline methods. In this work, we also introduce Datasets specifically designed for TSC with offline RL (referred to as DTRL). Our datasets and code are publicly available.
翻译:高效的交通信号控制对缓解交通拥堵、提升整体运输效率至关重要。交通流的动态特性促使研究人员探索基于强化学习的交通信号控制方法。与传统方法相比,基于强化学习的解决方案展现出更优性能。然而,基于强化学习的交通信号控制器在实际应用中的局限性在于样本效率低且计算需求高。本文提出DTLight——一种简洁而强大的轻量级基于决策Transformer的交通信号控制方法,可从易于获取的离线数据集中学习策略。DTLight创新性地利用知识蒸馏技术,从训练成熟的大规模教师模型中学习轻量级控制器,以降低实现计算量。此外,它集成了适配器模块以降低微调成本,使得DTLight在实际部署中仅需少量计算和极少的微调步骤即可实现在线自适应。DTLight还经过进一步增强,更适用于真实交通信号控制问题。在合成场景和真实场景中的大量实验表明,基于纯离线数据集预训练的DTLight在大多数场景中可超越最先进的在线强化学习方法。实验结果还显示,在线微调进一步将DTLight的性能提升至最高超过最优在线强化学习基线方法42.6%。本文还针对离线强化学习场景引入了专用于交通信号控制的数据集(简称DTRL)。我们的数据集和代码均已公开。