Adaptive traffic signal control with Multi-agent Reinforcement Learning(MARL) is a very popular topic nowadays. In most existing novel methods, one agent controls single intersections and these methods focus on the cooperation between intersections. However, the non-stationary property of MARL still limits the performance of the above methods as the size of traffic networks grows. One compromised strategy is to assign one agent with a region of intersections to reduce the number of agents. There are two challenges in this strategy, one is how to partition a traffic network into small regions and the other is how to search for the optimal joint actions for a region of intersections. In this paper, we propose a novel training framework RegionLight where our region partition rule is based on the adjacency between the intersection and extended Branching Dueling Q-Network(BDQ) to Dynamic Branching Dueling Q-Network(DBDQ) to bound the growth of the size of joint action space and alleviate the bias introduced by imaginary intersections outside of the boundary of the traffic network. Our experiments on both real datasets and synthetic datasets demonstrate that our framework performs best among other novel frameworks and that our region partition rule is robust.
翻译:基于多智能体强化学习的自适应交通信号控制是当前的热门研究课题。现有大多数新颖方法中,每个智能体控制单个交叉口,并聚焦于交叉口间的协作。然而,随着交通网络规模扩大,多智能体强化学习的非平稳特性仍限制了上述方法的性能。一种折中策略是为一个智能体分配一个交叉口区域以减少智能体数量。该策略面临两个挑战:一是如何将交通网络划分为小区域,二是如何为区域内的交叉口搜索最优联合动作。本文提出一种新型训练框架RegionLight,其中区域划分规则基于交叉口邻接性,并将分支对偶Q网络扩展为动态分支对偶Q网络,以约束联合动作空间规模增长并缓解交通网络边界外虚拟交叉口引入的偏差。在真实数据集与合成数据集上的实验表明,本框架性能优于其他新型框架,且区域划分规则具有鲁棒性。