Multi-agent Reinforcement Learning (MARL) based traffic signal control becomes a popular research topic in recent years. Most existing MARL approaches tend to learn the optimum control strategies in a decentralised manner by considering communication among neighbouring intersections. However, the non-stationary property in MARL may lead to extremely slow or even failure of convergence, especially when the number of intersections becomes large. One of the existing methods is to partition the whole network into several regions, each of which utilizes a centralized RL framework to speed up the convergence rate. However, there are two challenges for this strategy: the first one is how to get a flexible partition and the second one is how to search for the optimal joint actions for a region of intersections. In this paper, we propose a novel training framework where our region partitioning rule is based on the adjacency between the intersections and propose Dynamic Branching Dueling Q-Network (DBDQ) to search for optimal joint action efficiently and to maximize the regional reward. The experimental results with both real datasets and synthetic datasets demonstrate the superiority of our framework over other existing frameworks.
翻译:多智能体强化学习(MARL)在交通信号控制中的应用近年来成为热门研究课题。现有多数MARL方法倾向于通过考虑相邻交叉口之间的通信,以去中心化方式学习最优控制策略。然而,MARL中的非平稳特性可能导致收敛速度极慢甚至无法收敛,特别是当交叉口数量较大时。现有一种方法是将整个交通网络划分为若干区域,每个区域采用集中式强化学习框架以加快收敛速度。但该策略面临两大挑战:其一是如何实现灵活的划分,其二是如何为交叉口区域搜索最优联合动作。本文提出一种新颖的训练框架,其中区域划分规则基于交叉口之间的相邻关系,并提出了动态分支决斗Q网络(Dynamic Branching Dueling Q-Network, DBDQ)以高效搜索最优联合动作、最大化区域奖励。基于真实数据集与合成数据集的实验结果均表明,本框架在性能上优于其他现有框架。