Most of the current studies on autonomous vehicle decision-making and control tasks based on reinforcement learning are conducted in simulated environments. The training and testing of these studies are carried out under rule-based microscopic traffic flow, with little consideration of migrating them to real or near-real environments to test their performance. It may lead to a degradation in performance when the trained model is tested in more realistic traffic scenes. In this study, we propose a method to randomize the driving style and behavior of surrounding vehicles by randomizing certain parameters of the car-following model and the lane-changing model of rule-based microscopic traffic flow in SUMO. We trained policies with deep reinforcement learning algorithms under the domain randomized rule-based microscopic traffic flow in freeway and merging scenes, and then tested them separately in rule-based microscopic traffic flow and high-fidelity microscopic traffic flow. Results indicate that the policy trained under domain randomization traffic flow has significantly better success rate and calculative reward compared to the models trained under other microscopic traffic flows.
翻译:当前多数基于强化学习的自动驾驶决策与控制任务研究均在模拟环境中进行。这些研究的训练与测试均基于规则驱动的微观交通流,鲜有研究将其迁移至真实或近真实环境以验证性能。当训练模型应用于更真实的交通场景时,可能导致性能下降。本研究提出一种方法,通过随机化SUMO中规则驱动微观交通流的跟驰模型与换道模型参数,随机化周围车辆的驾驶风格与行为模式。我们采用深度强化学习算法,在高速公路与合流场景的域随机化规则驱动微观交通流中训练策略,随后分别在规则驱动微观交通流与高保真微观交通流中进行测试。结果表明,相较于其他微观交通流训练模型,域随机化交通流下训练的策略在成功率和累计奖励方面均具有显著优势。