Traffic signal control (TSC) is a core challenge in urban mobility, where real-time decisions must balance efficiency and safety. Existing methods - ranging from rule-based heuristics to reinforcement learning (RL) - often struggle to generalize to complex, dynamic, and safety-critical scenarios. We introduce VLMLight, a novel TSC framework that integrates vision-language meta-control with dual-branch reasoning. At the core of VLMLight is the first image-based traffic simulator that enables multi-view visual perception at intersections, allowing policies to reason over rich cues such as vehicle type, motion, and spatial density. A large language model (LLM) serves as a safety-prioritized meta-controller, selecting between a fast RL policy for routine traffic and a structured reasoning branch for critical cases. In the latter, multiple LLM agents collaborate to assess traffic phases, prioritize emergency vehicles, and verify rule compliance. Experiments show that VLMLight reduces waiting times for emergency vehicles by up to 65% over RL-only systems, while preserving real-time performance in standard conditions with less than 1% degradation. VLMLight offers a scalable, interpretable, and safety-aware solution for next-generation traffic signal control.
翻译:交通信号控制(TSC)是城市交通领域的核心挑战,其实时决策必须在效率与安全之间取得平衡。现有方法——从基于规则的启发式方法到强化学习(RL)——往往难以泛化至复杂、动态且安全关键的场景。本文提出VLMLight,一种新颖的TSC框架,它集成了视觉语言元控制与双分支推理。VLMLight的核心是首个基于图像的交通仿真器,能够在交叉路口实现多视角视觉感知,使策略能够基于车辆类型、运动状态和空间密度等丰富线索进行推理。一个大型语言模型(LLM)充当安全优先的元控制器,在用于常规交通的快速RL策略与用于关键情况的结构化推理分支之间进行选择。在后一分支中,多个LLM智能体协同工作,评估交通相位、优先处理紧急车辆并验证规则合规性。实验表明,与纯RL系统相比,VLMLight将紧急车辆的等待时间减少了高达65%,同时在标准条件下保持实时性能,且性能下降小于1%。VLMLight为下一代交通信号控制提供了一个可扩展、可解释且具备安全意识的解决方案。