Reinforcement Learning (RL) has been widely explored in Traffic Signal Control (TSC) applications, however, still no such system has been deployed in practice. A key barrier to progress in this area is the reality gap, the discrepancy that results from differences between simulation models and their real-world equivalents. In this paper, we address this challenge by first presenting a comprehensive analysis of potential simulation parameters that contribute to this reality gap. We then also examine two promising strategies that can bridge this gap: Domain Randomization (DR) and Model-Agnostic Meta-Learning (MAML). Both strategies were trained with a traffic simulation model of an intersection. In addition, the model was embedded in LemgoRL, a framework that integrates realistic, safety-critical requirements into the control system. Subsequently, we evaluated the performance of the two methods on a separate model of the same intersection that was developed with a different traffic simulator. In this way, we mimic the reality gap. Our experimental results show that both DR and MAML outperform a state-of-the-art RL algorithm, therefore highlighting their potential to mitigate the reality gap in RLbased TSC systems.
翻译:强化学习在交通信号控制应用中已被广泛探索,然而至今尚未有此类系统在实际中得到部署。该领域进展的一个关键障碍是现实差距,即仿真模型与其真实世界对应物之间的差异所导致的不一致。本文通过首先对造成这一现实差距的潜在仿真参数进行全面分析来应对这一挑战。随后,我们还探讨了两种有望弥合这一差距的策略:域随机化与模型无关元学习。两种策略均使用交叉口的交通仿真模型进行训练。此外,该模型还嵌入到LemgoRL框架中,该框架将现实的安全关键需求集成到控制系统中。随后,我们在同一交叉口的另一个模型上评估了这两种方法的性能,该模型使用不同的交通仿真器开发。通过这种方式,我们模拟了现实差距。实验结果表明,域随机化和模型无关元学习均优于最先进的强化学习算法,从而凸显了它们在基于强化学习的交通信号控制系统中缓解现实差距的潜力。