Perimeter Control with Heterogeneous Metering Rates for Cordon Signals: A Physics-Regularized Multi-Agent Reinforcement Learning Approach

Perimeter Control (PC) strategies have been proposed to address urban road network control in oversaturated situations by regulating the transfer flow of the Protected Network (PN) based on the Macroscopic Fundamental Diagram (MFD). The uniform metering rate for cordon signals in most existing studies overlooks the variance of local traffic states at the intersection level, which may cause severe local traffic congestion and degradation of the network stability. PC strategies with heterogeneous metering rates for cordon signals allow precise control for the perimeter but the complexity of the problem increases exponentially with the scale of the PN. This paper leverages a Multi-Agent Reinforcement Learning (MARL)-based traffic signal control framework to decompose this PC problem, which considers heterogeneous metering rates for cordon signals, into multi-agent cooperation tasks. Each agent controls an individual signal located in the cordon, decreasing the dimension of action space for the controller compared to centralized methods. A physics regularization approach for the MARL framework is proposed to ensure the distributed cordon signal controllers are aware of the global network state by encoding MFD-based knowledge into the action-value functions of the local agents. The proposed PC strategy is operated as a two-stage system, with a feedback PC strategy detecting the overall traffic state within the PN and then distributing local instructions to cordon signals controllers in the MARL framework via the physics regularization. Through numerical tests with different demand patterns in a microscopic traffic environment, the proposed PC strategy shows promising robustness and transferability. It outperforms state-of-the-art feedback PC strategies in increasing network throughput, decreasing distributed delay for gate links, and reducing carbon emissions.

翻译：周界控制策略已被提出，通过基于宏观基本图调节受保护网络的转移流量，以应对过饱和状态下的城市路网控制问题。现有研究大多采用边界信号的统一计量率，忽视了交叉口层面局部交通状态的差异性，可能导致严重的局部交通拥堵和网络稳定性下降。采用异质计量率的边界信号周界控制策略虽能实现精准的周界控制，但问题复杂度随受保护网络规模呈指数级增长。本文利用基于多智能体强化学习的交通信号控制框架，将这一考虑边界信号异质计量率的周界控制问题分解为多智能体协作任务。每个智能体控制边界上的单个信号，与集中式方法相比降低了控制器的动作空间维度。本文提出一种用于多智能体强化学习框架的物理正则化方法，通过将基于宏观基本图的知识编码至局部智能体的动作价值函数中，确保分布式边界信号控制器能感知全局网络状态。所提出的周界控制策略采用两级系统运行：反馈式周界控制策略检测受保护网络内的整体交通状态，随后通过物理正则化向多智能体强化学习框架中的边界信号控制器分发局部指令。通过在微观交通环境中对不同需求模式进行数值测试，所提出的周界控制策略展现出良好的鲁棒性和可迁移性。在提升网络吞吐量、降低通道链路分布延迟以及减少碳排放方面，该策略均优于当前最先进的反馈式周界控制策略。