基于约束的多智能体强化学习自主交通信号控制方法 (A Constrained Multi-Agent Reinforcement Learning Approach to Autonomous Traffic Signal Control)

Traffic congestion in modern cities is exacerbated by the limitations of traditional fixed-time traffic signal systems, which fail to adapt to dynamic traffic patterns. Adaptive Traffic Signal Control (ATSC) algorithms have emerged as a solution by dynamically adjusting signal timing based on real-time traffic conditions. However, the main limitation of such methods is that they are not transferable to environments under real-world constraints, such as balancing efficiency, minimizing collisions, and ensuring fairness across intersections. In this paper, we view the ATSC problem as a constrained multi-agent reinforcement learning (MARL) problem and propose a novel algorithm named Multi-Agent Proximal Policy Optimization with Lagrange Cost Estimator (MAPPO-LCE) to produce effective traffic signal control policies. Our approach integrates the Lagrange multipliers method to balance rewards and constraints, with a cost estimator for stable adjustment. We also introduce three constraints on the traffic network: GreenTime, GreenSkip, and PhaseSkip, which penalize traffic policies that do not conform to real-world scenarios. Our experimental results on three real-world datasets demonstrate that MAPPO-LCE outperforms three baseline MARL algorithms by across all environments and traffic constraints (improving on MAPPO by 12.60%, IPPO by 10.29%, and QTRAN by 13.10%). Our results show that constrained MARL is a valuable tool for traffic planners to deploy scalable and efficient ATSC methods in real-world traffic networks. We provide code at https://github.com/Asatheesh6561/MAPPO-LCE.

翻译：现代城市的交通拥堵因传统固定时长交通信号系统的局限性而加剧，此类系统无法适应动态变化的交通模式。自适应交通信号控制算法通过基于实时交通状况动态调整信号时长，已成为一种解决方案。然而，此类方法的主要局限在于其难以迁移到受现实世界约束的环境，例如需在效率、碰撞最小化以及交叉口间公平性之间取得平衡。本文将ATSC问题视为一个约束多智能体强化学习问题，并提出了一种名为"带拉格朗日成本估计器的多智能体近端策略优化"的新算法，以生成有效的交通信号控制策略。我们的方法集成了拉格朗日乘子法来平衡奖励与约束，并采用成本估计器实现稳定调整。我们还引入了交通网络上的三个约束条件：绿灯时长、绿灯跳过和相位跳过，用以惩罚不符合现实场景的交通策略。在三个真实世界数据集上的实验结果表明，MAPPO-LCE在所有环境和交通约束下均优于三种基线MARL算法（较MAPPO提升12.60%，较IPPO提升10.29%，较QTRAN提升13.10%）。我们的研究结果表明，约束MARL是交通规划者在现实交通网络中部署可扩展高效ATSC方法的宝贵工具。代码发布于https://github.com/Asatheesh6561/MAPPO-LCE。