Traffic simulators are used to generate data for learning in intelligent transportation systems (ITSs). A key question is to what extent their modelling assumptions affect the capabilities of ITSs to adapt to various scenarios when deployed in the real world. This work focuses on two simulators commonly used to train reinforcement learning (RL) agents for traffic applications, CityFlow and SUMO. A controlled virtual experiment varying driver behavior and simulation scale finds evidence against distributional equivalence in RL-relevant measures from these simulators, with the root mean squared error and KL divergence being significantly greater than 0 for all assessed measures. While granular real-world validation generally remains infeasible, these findings suggest that traffic simulators are not a deus ex machina for RL training: understanding the impacts of inter-simulator differences is necessary to train and deploy RL-based ITSs.
翻译:交通模拟器用于生成智能交通系统(ITS)学习所需的数据。一个关键问题是,其建模假设在多大程度上影响智能交通系统部署到现实世界后适应各种场景的能力。本研究聚焦于两种常用于训练交通领域强化学习(RL)智能体的模拟器——CityFlow和SUMO。通过控制变量实验,改变驾驶员行为和模拟规模,我们发现这些模拟器在RL相关指标上并不具有分布等价性——所有评估指标的均方根误差和KL散度均显著大于0。尽管细粒度的真实世界验证通常仍不可行,但这些发现表明,交通模拟器并非强化学习训练的万灵药:理解不同模拟器间差异的影响对于训练和部署基于强化学习的智能交通系统至关重要。