Efficient traffic control (TSC) is essential for urban mobility, but traditional systems struggle to handle the complexity of real-world traffic. Multi-agent Reinforcement Learning (MARL) offers adaptive solutions, but online MARL requires extensive interactions with the environment, making it costly and impractical. Offline MARL mitigates these challenges by using historical traffic data for training but faces significant difficulties with heterogeneous behavior policies in real-world datasets, where mixed-quality data complicates learning. We introduce OffLight, a novel offline MARL framework designed to handle heterogeneous behavior policies in TSC datasets. To improve learning efficiency, OffLight incorporates Importance Sampling (IS) to correct for distributional shifts and Return-Based Prioritized Sampling (RBPS) to focus on high-quality experiences. OffLight utilizes a Gaussian Mixture Variational Graph Autoencoder (GMM-VGAE) to capture the diverse distribution of behavior policies from local observations. Extensive experiments across real-world urban traffic scenarios show that OffLight outperforms existing offline RL methods, achieving up to a 7.8% reduction in average travel time and 11.2% decrease in queue length. Ablation studies confirm the effectiveness of OffLight's components in handling heterogeneous data and improving policy performance. These results highlight OffLight's scalability and potential to improve urban traffic management without the risks of online learning.
翻译:高效的交通信号控制对于城市交通至关重要,但传统系统难以应对现实世界交通的复杂性。多智能体强化学习提供了自适应解决方案,但在线多智能体强化学习需要与环境进行大量交互,导致成本高昂且不切实际。离线多智能体强化学习通过使用历史交通数据进行训练来缓解这些挑战,但在处理现实世界数据集中异构行为策略时面临显著困难,其中混合质量的数据使学习过程复杂化。本文提出OffLight,一种新颖的离线多智能体强化学习框架,专门设计用于处理交通信号控制数据集中的异构行为策略。为提高学习效率,OffLight结合重要性采样以校正分布偏移,并采用基于回报的优先采样来聚焦高质量经验。OffLight利用高斯混合变分图自编码器从局部观测中捕获行为策略的多样化分布。在现实世界城市交通场景中的大量实验表明,OffLight优于现有离线强化学习方法,实现了平均行程时间最高降低7.8%和排队长度减少11.2%。消融研究证实了OffLight各组件在处理异构数据和提升策略性能方面的有效性。这些结果凸显了OffLight的可扩展性及其在不承担在线学习风险的情况下改善城市交通管理的潜力。