Modern approaches to autonomous driving rely heavily on learned components trained with large amounts of human driving data via imitation learning. However, these methods require large amounts of expensive data collection and even then face challenges with safely handling long-tail scenarios and compounding errors over time. At the same time, pure Reinforcement Learning (RL) methods can fail to learn performant policies in sparse, constrained, and challenging-to-define reward settings like driving. Both of these challenges make deploying purely cloned policies in safety critical applications like autonomous vehicles challenging. In this paper we propose Combining IMitation and Reinforcement Learning (CIMRL) approach - a framework that enables training driving policies in simulation through leveraging imitative motion priors and safety constraints. CIMRL does not require extensive reward specification and improves on the closed loop behavior of pure cloning methods. By combining RL and imitation, we demonstrate that our method achieves state-of-the-art results in closed loop simulation driving benchmarks.
翻译:现代自动驾驶方法严重依赖于通过模仿学习利用大量人类驾驶数据训练的组件。然而,这些方法需要大量昂贵的数据收集,即便如此仍面临安全处理长尾场景及随时间推移误差累积的挑战。与此同时,纯粹的强化学习方法在如驾驶这类稀疏、受限且奖励函数难以定义的场景中可能无法学习到高性能策略。这两类挑战使得在自动驾驶等安全关键应用中部署纯克隆策略面临困难。本文提出结合模仿学习与强化学习的方法——一种通过利用模仿运动先验与安全约束在仿真环境中训练驾驶策略的框架。CIMRL无需复杂的奖励函数设计,并改进了纯克隆方法的闭环行为表现。通过融合强化学习与模仿学习,我们证明该方法在闭环仿真驾驶基准测试中取得了最先进的性能。