Adaptive Traffic Signal Control (ATSC) aims to optimize traffic flow and minimize delays by adjusting traffic lights in real time. Recent advances in Multi-agent Reinforcement Learning (MARL) have shown promise for ATSC, yet existing approaches still suffer from limited representational capacity, often leading to suboptimal performance and poor generalization in complex and dynamic traffic environments. On the other hand, Large Language Models (LLMs) excel at semantic representation, reasoning, and analysis, yet their propensity for hallucination and slow inference speeds often hinder their direct application to decision-making tasks. To address these challenges, we propose a novel learning paradigm named LATS that integrates LLMs and MARL, leveraging the former's strong prior knowledge and inductive abilities to enhance the latter's decision-making process. Specifically, we introduce a plug-and-play teacher-student learning module, where a trained embedding LLM serves as a teacher to generate rich semantic features that capture each intersection's topology structures and traffic dynamics. A much simpler (student) neural network then learns to emulate these features through knowledge distillation in the latent space, enabling the final model to operate independently from the LLM for downstream use in the RL decision-making process. This integration significantly enhances the overall model's representational capacity across diverse traffic scenarios, thus leading to more efficient and generalizable control strategies. Extensive experiments across diverse traffic datasets empirically demonstrate that our method enhances the representation learning capability of RL models, thereby leading to improved overall performance and generalization over both traditional RL and LLM-only approaches. [...]
翻译:自适应交通信号控制(ATSC)旨在通过实时调整交通灯来优化交通流并最小化延误。多智能体强化学习(MARL)的最新进展为ATSC带来了希望,然而现有方法仍受限于表征能力不足,导致在复杂动态交通环境中性能欠佳且泛化能力差。另一方面,大语言模型(LLM)在语义表征、推理与分析方面表现出色,但其易产生幻觉且推理速度慢的缺陷往往阻碍其直接应用于决策任务。为应对这些挑战,本文提出名为LATS的新型学习范式,将LLM与MARL相结合,利用前者强大的先验知识与归纳能力来增强后者的决策过程。具体而言,我们引入即插即用的师生学习模块:训练好的嵌入LLM作为教师,生成捕获每个交叉口拓扑结构与交通动态的丰富语义特征;结构更简单的(学生)神经网络则通过潜在空间知识蒸馏学习模仿这些特征,使最终模型在后续强化学习决策过程中能脱离LLM独立运行。该集成显著增强了整体模型在不同交通场景下的表征能力,从而产生更高效且泛化性更强的控制策略。在不同交通数据集上的大量实验表明,我们的方法提升了强化学习模型的表征学习能力,进而相比传统强化学习及纯LLM方法实现了更优的整体性能与泛化能力。[……]