Reinforcement Learning (RL) in Traffic Signal Control (TSC) faces significant hurdles in real-world deployment due to limited generalization to dynamic traffic flow variations. Existing approaches often overfit static patterns and use action spaces incompatible with driver expectations. This paper proposes a robust Multi-Agent Reinforcement Learning (MARL) framework validated in the Vissim traffic simulator. The framework integrates three mechanisms: (1) Turning Ratio Randomization, a training strategy that exposes agents to dynamic turning probabilities to enhance robustness against unseen scenarios; (2) a stability-oriented Exponential Phase Duration Adjustment action space, which balances responsiveness and precision through cyclical, exponential phase adjustments; and (3) a Neighbor-Based Observation scheme utilizing the MAPPO algorithm with Centralized Training with Decentralized Execution (CTDE). By leveraging centralized updates, this approach approximates the efficacy of global observations while maintaining scalable local communication. Experimental results demonstrate that our framework outperforms standard RL baselines, reducing average waiting time by over 10%. The proposed model exhibits superior generalization in unseen traffic scenarios and maintains high control stability, offering a practical solution for adaptive signal control.
翻译:交通信号控制中的强化学习由于对动态交通流变化的泛化能力有限,在实际部署中面临重大障碍。现有方法通常过度拟合静态模式,且采用与驾驶员预期不兼容的动作空间。本文提出一种在Vissim交通仿真器中验证的鲁棒多智能体强化学习框架。该框架整合了三种机制:(1)转向比随机化——一种通过将智能体暴露于动态转向概率以增强对未见场景鲁棒性的训练策略;(2)面向稳定性的指数相位时长调整动作空间——通过周期性指数相位调整实现响应性与精确性的平衡;(3)基于邻居观测的方案,采用MAPPO算法及集中训练分散执行架构。通过利用集中式更新,该方法在保持可扩展局部通信的同时,近似实现了全局观测的效果。实验结果表明,本框架优于标准强化学习基线,平均等待时间降低超过10%。所提模型在未见交通场景中展现出卓越的泛化能力,并保持高控制稳定性,为自适应信号控制提供了实用解决方案。