Reinforcement learning has been demonstrated to outperform even the best humans in complex domains like video games. However, running reinforcement learning experiments on the required scale for autonomous driving is extremely difficult. Building a large scale reinforcement learning system and distributing it across many GPUs is challenging. Gathering experience during training on real world vehicles is prohibitive from a safety and scalability perspective. Therefore, an efficient and realistic driving simulator is required that uses a large amount of data from real-world driving. We bring these capabilities together and conduct large-scale reinforcement learning experiments for autonomous driving. We demonstrate that our policy performance improves with increasing scale. Our best performing policy reduces the failure rate by 64% while improving the rate of driving progress by 25% compared to the policies produced by state-of-the-art machine learning for autonomous driving.
翻译:强化学习已被证明在视频游戏等复杂领域中能够超越最优秀的人类表现。然而,在自动驾驶所需规模上运行强化学习实验极为困难。构建大规模强化学习系统并将其分布式部署于多GPU环境具有挑战性。在真实车辆上进行训练时收集经验数据,从安全性和可扩展性角度考虑均难以实现。因此,需要一种高效且逼真的驾驶模拟器,该模拟器需利用大量真实世界驾驶数据。我们将这些能力整合,开展了面向自动驾驶的大规模强化学习实验。实验表明,我们的策略性能随规模扩大而持续提升。与当前最先进的自动驾驶机器学习方法生成的策略相比,我们最佳性能策略的故障率降低了64%,同时驾驶进度率提升了25%。