Reinforcement learning has been demonstrated to outperform even the best humans in complex domains like video games. However, running reinforcement learning experiments on the required scale for autonomous driving is extremely difficult. Building a large scale reinforcement learning system and distributing it across many GPUs is challenging. Gathering experience during training on real world vehicles is prohibitive from a safety and scalability perspective. Therefore, an efficient and realistic driving simulator is required that uses a large amount of data from real-world driving. We bring these capabilities together and conduct large-scale reinforcement learning experiments for autonomous driving. We demonstrate that our policy performance improves with increasing scale. Our best performing policy reduces the failure rate by 64% while improving the rate of driving progress by 25% compared to the policies produced by state-of-the-art machine learning for autonomous driving.
翻译:强化学习已被证明在视频游戏等复杂领域中能够超越最优秀的人类表现。然而,在自动驾驶所需的规模上运行强化学习实验极其困难。构建大规模强化学习系统并将其分布到多个GPU上具有挑战性。在真实车辆上训练期间收集经验从安全性和可扩展性角度来看难以实现。因此,需要一个利用大量真实驾驶数据的高效逼真驾驶模拟器。我们整合这些能力,开展了针对自动驾驶的大规模强化学习实验。我们证明了策略性能随规模扩大而提升。与最先进的自动驾驶机器学习策略相比,我们最优策略将故障率降低了64%,同时将行驶进度率提高了25%。