Urban congestion remains a critical challenge, with traffic signal control (TSC) emerging as a potent solution. TSC is often modeled as a Markov Decision Process problem and then solved using reinforcement learning (RL), which has proven effective. However, the existing RL-based TSC system often overlooks imperfect observations caused by degraded communication, such as packet loss, delays, and noise, as well as rare real-life events not included in the reward function, such as unconsidered emergency vehicles. To address these limitations, we introduce a novel integration framework that combines a large language model (LLM) with RL. This framework is designed to manage overlooked elements in the reward function and gaps in state information, thereby enhancing the policies of RL agents. In our approach, RL initially makes decisions based on observed data. Subsequently, LLMs evaluate these decisions to verify their reasonableness. If a decision is found to be unreasonable, it is adjusted accordingly. Additionally, this integration approach can be seamlessly integrated with existing RL-based TSC systems without necessitating modifications. Extensive testing confirms that our approach reduces the average waiting time by $17.5\%$ in degraded communication conditions as compared to traditional RL methods, underscoring its potential to advance practical RL applications in intelligent transportation systems. The related code can be found at \url{https://github.com/Traffic-Alpha/iLLM-TSC}.
翻译:城市拥堵依然是一个关键挑战,交通信号控制(TSC)已成为一种有效的解决方案。TSC通常被建模为马尔可夫决策过程问题,然后使用强化学习(RL)进行求解,该方法已被证明是有效的。然而,现有的基于RL的TSC系统常常忽视由通信质量下降(如数据包丢失、延迟和噪声)导致的不完美观测,以及奖励函数中未包含的罕见现实事件(如未考虑的紧急车辆)。为应对这些局限,我们提出了一种新颖的融合框架,将大语言模型(LLM)与RL相结合。该框架旨在管理奖励函数中被忽略的因素以及状态信息中的缺口,从而增强RL智能体的策略。在我们的方法中,RL首先基于观测数据做出决策。随后,LLM评估这些决策以验证其合理性。若发现某个决策不合理,则对其进行相应调整。此外,这种融合方法可以无缝集成到现有的基于RL的TSC系统中,而无需进行修改。大量测试证实,与传统RL方法相比,我们的方法在通信质量下降条件下将平均等待时间降低了$17.5\%$,突显了其在推进智能交通系统中实际RL应用方面的潜力。相关代码可在 \url{https://github.com/Traffic-Alpha/iLLM-TSC} 找到。