Adaptive traffic signal control (ATSC) is crucial in alleviating congestion, maximizing throughput and promoting sustainable mobility in ever-expanding cities. Multi-Agent Reinforcement Learning (MARL) has recently shown significant potential in addressing complex traffic dynamics, but the intricacies of partial observability and coordination in decentralized environments still remain key challenges in formulating scalable and efficient control strategies. To address these challenges, we present CoordLight, a MARL-based framework designed to improve intra-neighborhood traffic by enhancing decision-making at individual junctions (agents), as well as coordination with neighboring agents, thereby scaling up to network-level traffic optimization. Specifically, we introduce the Queue Dynamic State Encoding (QDSE), a novel state representation based on vehicle queuing models, which strengthens the agents' capability to analyze, predict, and respond to local traffic dynamics. We further propose an advanced MARL algorithm, named Neighbor-aware Policy Optimization (NAPO). It integrates an attention mechanism that discerns the state and action dependencies among adjacent agents, aiming to facilitate more coordinated decision-making, and to improve policy learning updates through robust advantage calculation. This enables agents to identify and prioritize crucial interactions with influential neighbors, thus enhancing the targeted coordination and collaboration among agents. Through comprehensive evaluations against state-of-the-art traffic signal control methods over three real-world traffic datasets composed of up to 196 intersections, we empirically show that CoordLight consistently exhibits superior performance across diverse traffic networks with varying traffic flows. The code is available at https://github.com/marmotlab/CoordLight
翻译:自适应交通信号控制(ATSC)在缓解拥堵、最大化通行能力以及促进城市可持续发展中至关重要。多智能体强化学习(MARL)近期在处理复杂交通动态方面展现出巨大潜力,但分散环境下的部分可观测性与协调复杂性仍是构建可扩展高效控制策略的关键挑战。为应对这些挑战,我们提出CoordLight——一个基于MARL的框架,旨在通过增强单个交叉口(智能体)的决策能力及其与相邻智能体的协调,实现社区内交通优化并扩展至网络级交通优化。具体而言,我们引入队列动态状态编码(QDSE),一种基于车辆排队模型的新型状态表征,提升智能体分析、预测及响应局部交通动态的能力。进一步提出先进MARL算法——邻域感知策略优化(NAPO),该算法集成注意力机制以识别相邻智能体间的状态与动作依赖关系,旨在促进更协调的决策,并通过鲁棒优势计算改进策略学习更新。这使得智能体能够识别并优先处理与关键邻域智能体的重要交互,从而增强智能体间的靶向协调与协作。通过在三个包含多达196个交叉口的真实交通数据集上,与最先进的交通信号控制方法进行综合评估,我们实证表明CoordLight在不同车流模式的多样化交通网络中持续展现卓越性能。代码已开源至https://github.com/marmotlab/CoordLight