Existing traffic signal control systems rely on oversimplified rule-based methods, and even RL-based methods are often suboptimal and unstable. To address this, we propose a cooperative multi-objective architecture called Multi-Objective Multi-Agent Deep Deterministic Policy Gradient (MOMA-DDPG), which estimates multiple reward terms for traffic signal control optimization using age-decaying weights. Our approach involves two types of agents: one focuses on optimizing local traffic at each intersection, while the other aims to optimize global traffic throughput. We evaluate our method using real-world traffic data collected from an Asian country's traffic cameras. Despite the inclusion of a global agent, our solution remains decentralized as this agent is no longer necessary during the inference stage. Our results demonstrate the effectiveness of MOMA-DDPG, outperforming state-of-the-art methods across all performance metrics. Additionally, our proposed system minimizes both waiting time and carbon emissions. Notably, this paper is the first to link carbon emissions and global agents in traffic signal control.
翻译:现有交通信号控制系统依赖过于简化的基于规则的方法,即便基于强化学习的方法也常常次优且不稳定。为解决这一问题,我们提出了一种协同多目标架构——多目标多智能体深度确定性策略梯度(MOMA-DDPG),该架构使用年龄衰减权重估计多个奖励项以优化交通信号控制。我们的方法涉及两类智能体:一类专注于优化每个交叉口的局部交通,另一类则旨在优化全局交通吞吐量。我们使用从某亚洲国家交通摄像头收集的真实交通数据评估了该方法。尽管包含全局智能体,但我们的方案仍是去中心化的,因为该智能体在推理阶段不再需要。实验结果表明,MOMA-DDPG在所有性能指标上均超越现有最优方法,展现出有效性。此外,我们提出的系统能同时最小化等待时间和碳排放量。值得注意的是,本文首次将碳排放与交通信号控制中的全局智能体建立关联。