Traffic signal control (TSC) is a complex and important task that affects the daily lives of millions of people. Reinforcement Learning (RL) has shown promising results in optimizing traffic signal control, but current RL-based TSC methods are mainly trained in simulation and suffer from the performance gap between simulation and the real world. In this paper, we propose a simulation-to-real-world (sim-to-real) transfer approach called UGAT, which transfers a learned policy trained from a simulated environment to a real-world environment by dynamically transforming actions in the simulation with uncertainty to mitigate the domain gap of transition dynamics. We evaluate our method on a simulated traffic environment and show that it significantly improves the performance of the transferred RL policy in the real world.
翻译:交通信号控制(TSC)是一项复杂且重要的任务,影响着数百万人的日常生活。强化学习(RL)在优化交通信号控制方面已展现出令人鼓舞的结果,但当前基于强化学习的交通信号控制方法主要在仿真环境中训练,并且面临仿真与现实世界之间的性能差距。本文提出了一种名为UGAT的仿真到现实(sim-to-real)迁移方法,该方法通过动态变换仿真中带有不确定性的动作来缓解转移动态的领域差距,从而将仿真环境中训练得到的策略迁移至现实世界环境。我们在仿真交通环境中评估了所提方法,结果表明该方法显著提升了迁移后的强化学习策略在现实世界中的性能。