The effectiveness of traffic light control has been significantly improved by current reinforcement learning-based approaches via better cooperation among multiple traffic lights. However, a persisting issue remains: how to obtain a multi-agent traffic signal control algorithm with remarkable transferability across diverse cities? In this paper, we propose a Transformer on Transformer (TonT) model for cross-city meta multi-agent traffic signal control, named as X-Light: We input the full Markov Decision Process trajectories, and the Lower Transformer aggregates the states, actions, rewards among the target intersection and its neighbors within a city, and the Upper Transformer learns the general decision trajectories across different cities. This dual-level approach bolsters the model's robust generalization and transferability. Notably, when directly transferring to unseen scenarios, ours surpasses all baseline methods with +7.91% on average, and even +16.3% in some cases, yielding the best results.
翻译:当前基于强化学习的方法通过增强多交通灯之间的协作,显著提升了交通信号灯控制的有效性。然而,一个持续存在的问题依然存在:如何获得一种具有跨城市显著可迁移性的多智能体交通信号控制算法?本文提出了一种用于跨城市元多智能体交通信号控制的Transformer叠加Transformer(TonT)模型,命名为X-Light:我们将完整的马尔可夫决策过程轨迹作为输入,其中下层Transformer聚合目标交叉口及其城市内邻接交叉口的状态、动作与奖励,而上层Transformer则学习不同城市间的通用决策轨迹。这种双层级方法增强了模型的鲁棒泛化能力与可迁移性。值得注意的是,在直接迁移至未见过的场景时,我们的方法平均超越所有基线方法7.91%,部分情况下甚至达到16.3%,取得了最优结果。