Multi-agent Reinforcement Learning (MARL) based traffic signal control becomes a popular research topic in recent years. Most existing MARL approaches tend to learn the optimum control strategies in a decentralised manner by considering communication among neighbouring intersections. However, the non-stationary property in MARL may lead to extremely slow or even failure of convergence, especially when the number of intersections becomes large. One of the existing methods is to partition the whole network into several regions, each of which utilizes a centralized RL framework to speed up the convergence rate. However, there are two challenges for this strategy: the first one is how to get a flexible partition and the second one is how to search for the optimal joint actions for a region of intersections. In this paper, we propose a novel training framework where our region partitioning rule is based on the adjacency between the intersections and propose Dynamic Branching Dueling Q-Network (DBDQ) to search for optimal joint action efficiently and to maximize the regional reward. The experimental results with both real datasets and synthetic datasets demonstrate the superiority of our framework over other existing frameworks.
翻译:基于多智能体强化学习的交通信号控制近年来成为热门研究课题。现有大多数多智能体强化学习方法倾向于通过考虑相邻交叉口之间的通信,以分散方式学习最优控制策略。然而,多智能体强化学习中的非平稳特性可能导致收敛速度极慢甚至失败,尤其是当交叉口数量增大时。现有方法之一是采用区域划分策略将整个路网划分为若干区域,每个区域利用集中式强化学习框架以加快收敛速度。但这一策略面临两个挑战:一是如何实现灵活的区域划分,二是如何为区域内的交叉口搜索最优联合动作。本文提出一种新颖的训练框架,其中区域划分规则基于交叉口之间的相邻关系,并设计了动态分支对偶Q网络以高效搜索最优联合动作并最大化区域奖励。基于真实数据集和合成数据集的实验结果表明,该框架优于其他现有框架。