将公交信号优先集成至基于多智能体强化学习的交通信号控制 (Integrating Transit Signal Priority into Multi-Agent Reinforcement Learning based Traffic Signal Control)

This study integrates Transit Signal Priority (TSP) into multi-agent reinforcement learning (MARL) based traffic signal control. The first part of the study develops adaptive signal control based on MARL for a pair of coordinated intersections in a microscopic simulation environment. The two agents, one for each intersection, are centrally trained using a value decomposition network (VDN) architecture. The trained agents show slightly better performance compared to coordinated actuated signal control based on overall intersection delay at v/c of 0.95. In the second part of the study the trained signal control agents are used as background signal controllers while developing event-based TSP agents. In one variation, independent TSP agents are formulated and trained under a decentralized training and decentralized execution (DTDE) framework to implement TSP at each intersection. In the second variation, the two TSP agents are centrally trained under a centralized training and decentralized execution (CTDE) framework and VDN architecture to select and implement coordinated TSP strategies across the two intersections. In both cases the agents converge to the same bus delay value, but independent agents show high instability throughout the training process. For the test runs, the two independent agents reduce bus delay across the two intersections by 22% compared to the no TSP case while the coordinated TSP agents achieve 27% delay reduction. In both cases, there is only a slight increase in delay for a majority of the side street movements.

翻译：本研究将公交信号优先（TSP）集成到基于多智能体强化学习（MARL）的交通信号控制中。研究第一部分在微观仿真环境中为一对协调交叉口开发了基于MARL的自适应信号控制。两个交叉口各设一个智能体，采用值分解网络（VDN）架构进行集中训练。在v/c比为0.95时，相较于基于交叉口总延误的协调感应式信号控制，训练后的智能体表现出略优的性能。研究第二部分将训练后的信号控制智能体作为背景信号控制器，同时开发基于事件的TSP智能体。在第一种方案中，采用分散训练分散执行（DTDE）框架构建并训练独立的TSP智能体，以在各交叉口实施TSP。第二种方案则采用集中训练分散执行（CTDE）框架和VDN架构对两个TSP智能体进行集中训练，以选择并实施跨交叉口的协调TSP策略。两种方案中的智能体均收敛至相同的公交延误值，但独立智能体在整个训练过程中表现出高度不稳定性。在测试运行中，两个独立智能体使两个交叉口的公交延误较无TSP情况降低22%，而协调TSP智能体实现了27%的延误降低。两种方案下，大多数支路转向的延误仅轻微增加。