Most of the current studies on autonomous vehicle decision-making and control tasks based on reinforcement learning are conducted in simulated environments. The training and testing of these studies are carried out under rule-based microscopic traffic flow, with little consideration of migrating them to real or near-real environments to test their performance. It may lead to a degradation in performance when the trained model is tested in more realistic traffic scenes. In this study, we propose a method to randomize the driving style and behavior of surrounding vehicles by randomizing certain parameters of the car-following model and the lane-changing model of rule-based microscopic traffic flow in SUMO. We trained policies with deep reinforcement learning algorithms under the domain randomized rule-based microscopic traffic flow in freeway and merging scenes, and then tested them separately in rule-based microscopic traffic flow and high-fidelity microscopic traffic flow. Results indicate that the policy trained under domain randomization traffic flow has significantly better success rate and calculative reward compared to the models trained under other microscopic traffic flows.
翻译:当前基于强化学习的自动驾驶决策与控制任务研究大多在模拟环境中进行。这些研究的训练与测试均在基于规则的微观交通流环境下开展,较少考虑将其迁移至真实或近真实环境中测试性能。这可能导致训练后的模型在更真实的交通场景中性能下降。本研究提出了一种方法,通过随机化SUMO中基于规则的微观交通流的跟车模型与换道模型的某些参数,实现周围车辆驾驶风格与行为的随机化。我们采用深度强化学习算法,在高速公路与合流场景的域随机化规则微观交通流中训练策略,随后分别在规则微观交通流和高保真微观交通流中测试。结果表明,在域随机化交通流下训练的策略,其成功率和累积奖励显著优于其他微观交通流下训练的模型。