Multi-agent reinforcement learning (MARL) has emerged as a promising paradigm for adaptive traffic signal control (ATSC) of multiple intersections. Existing approaches typically follow either a fully centralized or a fully decentralized design. Fully centralized approaches suffer from the curse of dimensionality, and reliance on a single learning server, whereas purely decentralized approaches operate under severe partial observability and lack explicit coordination resulting in suboptimal performance. These limitations motivate region-based MARL, where the network is partitioned into smaller, tightly coupled intersections that form regions, and training is organized around these regions. This paper introduces a Semi-Centralized Training, Decentralized Execution (SEMI-CTDE) architecture for multi intersection ATSC. Within each region, SEMI-CTDE performs centralized training with regional parameter sharing and employs composite state and reward formulations that jointly encode local and regional information. The architecture is highly transferable across different policy backbones and state-reward instantiations. Building on this architecture, we implement two models with distinct design objectives. A multi-perspective experimental analysis of the two implemented SEMI-CTDE-based models covering ablations of the architecture's core elements including rule based and fully decentralized baselines shows that they achieve consistently superior performance and remain effective across a wide range of traffic densities and distributions.
翻译:多智能体强化学习(MARL)已成为多交叉口自适应交通信号控制(ATSC)的一种有前景的范式。现有方法通常遵循完全集中式或完全分布式设计。完全集中式方法面临维度灾难且依赖单一学习服务器,而纯粹分布式方法则受限于严重的部分可观测性并缺乏显式协调,导致性能次优。这些局限性催生了基于区域的MARL方法,即将交通网络划分为由紧密耦合交叉口组成的较小区域,并围绕这些区域进行训练。本文提出了一种面向多交叉口ATSC的半集中式训练与分布式执行(SEMI-CTDE)架构。在每个区域内,SEMI-CTDE通过区域参数共享执行集中式训练,并采用联合编码局部与区域信息的复合状态与奖励函数。该架构在不同策略骨干网络及状态-奖励实现方案间具有高度可迁移性。基于此架构,我们实现了两个具有不同设计目标的模型。对两个基于SEMI-CTDE的模型进行的涵盖架构核心要素消融实验(包括基于规则和完全分布式基线)的多视角实验分析表明,它们在不同交通密度与分布条件下均能持续获得优越性能并保持有效性。