The effectiveness of traffic light control has been significantly improved by current reinforcement learning-based approaches via better cooperation among multiple traffic lights. However, a persisting issue remains: how to obtain a multi-agent traffic signal control algorithm with remarkable transferability across diverse cities? In this paper, we propose a Transformer on Transformer (TonT) model for cross-city meta multi-agent traffic signal control, named as X-Light: We input the full Markov Decision Process trajectories, and the Lower Transformer aggregates the states, actions, rewards among the target intersection and its neighbors within a city, and the Upper Transformer learns the general decision trajectories across different cities. This dual-level approach bolsters the model's robust generalization and transferability. Notably, when directly transferring to unseen scenarios, ours surpasses all baseline methods with +7.91% on average, and even +16.3% in some cases, yielding the best results.
翻译:当前基于强化学习的方法通过增强多个交通信号灯之间的协作,显著提升了交通信号控制的效果。然而,一个持续存在的问题是:如何获得一种能够在不同城市间具有卓越可迁移性的多智能体交通信号控制算法?本文提出了一种用于跨城市元多智能体交通信号控制的Transformer on Transformer(TonT)模型,命名为X-Light:我们输入完整的马尔可夫决策过程轨迹,其中下层Transformer聚合目标交叉口及其城市内相邻交叉口的状态、动作和奖励,而上层Transformer则学习不同城市间的一般决策轨迹。这种双层方法增强了模型的鲁棒泛化能力和可迁移性。值得注意的是,在直接迁移到未见场景时,我们的方法平均超越所有基线方法+7.91%,在某些情况下甚至达到+16.3%,取得了最佳结果。