Numerous solutions are proposed for the Traffic Signal Control (TSC) tasks aiming to provide efficient transportation and mitigate congestion waste. In recent, promising results have been attained by Reinforcement Learning (RL) methods through trial and error in simulators, bringing confidence in solving cities' congestion headaches. However, there still exist performance gaps when simulator-trained policies are deployed to the real world. This issue is mainly introduced by the system dynamic difference between the training simulator and the real-world environments. The Large Language Models (LLMs) are trained on mass knowledge and proved to be equipped with astonishing inference abilities. In this work, we leverage LLMs to understand and profile the system dynamics by a prompt-based grounded action transformation. Accepting the cloze prompt template, and then filling in the answer based on accessible context, the pre-trained LLM's inference ability is exploited and applied to understand how weather conditions, traffic states, and road types influence traffic dynamics, being aware of this, the policies' action is taken and grounded based on realistic dynamics, thus help the agent learn a more realistic policy. We conduct experiments using DQN to show the effectiveness of the proposed PromptGAT's ability in mitigating the performance gap from simulation to reality (sim-to-real).
翻译:针对交通信号控制任务,研究者提出了众多解决方案,旨在实现高效交通并缓解拥堵浪费。近年来,强化学习方法通过在模拟器中的反复试错取得了令人瞩目的成果,为解决城市交通拥堵难题带来了信心。然而,当在模拟器中训练得到的策略部署到真实世界时,仍存在性能差距。这一问题主要源于训练模拟器与真实环境之间的系统动态差异。大语言模型基于海量知识训练,被证明具备惊人的推理能力。在本工作中,我们利用大语言模型,通过基于提示的可信动作转换来理解和刻画系统动态。通过接受完形填空式提示模板,并根据可获取的上下文填充答案,我们利用预训练大语言模型的推理能力来理解天气条件、交通状态和道路类型如何影响交通动态。基于此认知,策略的动作将根据真实动态进行采纳与落地,从而帮助智能体学习更符合现实的策略。我们使用DQN进行实验,证明了所提PromptGAT方法在缩小仿真到现实性能差距方面的有效性。