Although Reinforcement Learning (RL)-based Traffic Signal Control (TSC) methods have been extensively studied, their practical applications still raise some serious issues such as high learning cost and poor generalizability. This is because the ``trial-and-error'' training style makes RL agents extremely dependent on the specific traffic environment, which also requires a long convergence time. To address these issues, we propose a novel Federated Imitation Learning (FIL)-based framework for multi-intersection TSC, named FitLight, which allows RL agents to plug-and-play for any traffic environment without additional pre-training cost. Unlike existing imitation learning approaches that rely on pre-training RL agents with demonstrations, FitLight allows real-time imitation learning and seamless transition to reinforcement learning. Due to our proposed knowledge-sharing mechanism and novel hybrid pressure-based agent design, RL agents can quickly find a best control policy with only a few episodes. Moreover, for resource-constrained TSC scenarios, FitLight supports model pruning and heterogeneous model aggregation, such that RL agents can work on a micro-controller with merely 16{\it KB} RAM and 32{\it KB} ROM. Extensive experiments demonstrate that, compared to state-of-the-art methods, FitLight not only provides a superior starting point but also converges to a better final solution on both real-world and synthetic datasets, even under extreme resource limitations.
翻译:尽管基于强化学习(RL)的交通信号控制(TSC)方法已被广泛研究,但其实际应用仍存在学习成本高、泛化能力差等严重问题。这是因为“试错”式的训练方式使得RL智能体极度依赖特定的交通环境,同时也需要较长的收敛时间。为解决这些问题,我们提出了一种新颖的基于联邦模仿学习(FIL)的多交叉口TSC框架,命名为FitLight。该框架允许RL智能体在任何交通环境中实现即插即用,而无需额外的预训练成本。与现有依赖演示数据对RL智能体进行预训练的模仿学习方法不同,FitLight支持实时模仿学习并无缝切换至强化学习。得益于我们提出的知识共享机制和基于混合压力的新型智能体设计,RL智能体仅需少数几个训练周期即可快速找到最优控制策略。此外,针对资源受限的TSC场景,FitLight支持模型剪枝和异构模型聚合,使得RL智能体能够在仅配备16KB RAM和32KB ROM的微控制器上运行。大量实验表明,与现有最先进方法相比,FitLight不仅在真实世界和合成数据集上提供了更优的初始性能起点,而且能收敛至更好的最终解,即使在极端资源限制下也是如此。