Imitation learning (IL) is a simple and powerful way to use high-quality human driving data, which can be collected at scale, to produce human-like behavior. However, policies based on imitation learning alone often fail to sufficiently account for safety and reliability concerns. In this paper, we show how imitation learning combined with reinforcement learning using simple rewards can substantially improve the safety and reliability of driving policies over those learned from imitation alone. In particular, we train a policy on over 100k miles of urban driving data, and measure its effectiveness in test scenarios grouped by different levels of collision likelihood. Our analysis shows that while imitation can perform well in low-difficulty scenarios that are well-covered by the demonstration data, our proposed approach significantly improves robustness on the most challenging scenarios (over 38% reduction in failures). To our knowledge, this is the first application of a combined imitation and reinforcement learning approach in autonomous driving that utilizes large amounts of real-world human driving data.
翻译:模仿学习(IL)是一种简单而强大的方法,能够利用大规模收集的高质量人类驾驶数据生成类人行为。然而,仅基于模仿学习的策略往往难以充分解决安全性和可靠性问题。本文展示了如何将模仿学习与基于简单奖励的强化学习相结合,相较于纯模仿学习,可显著提升驾驶策略的安全性和可靠性。具体而言,我们基于超过10万英里的城市驾驶数据训练了一个策略,并在按不同碰撞可能性分组的测试场景中评估其有效性。分析表明,虽然模仿学习在演示数据充分覆盖的低难度场景中表现良好,但我们提出的方法在最具挑战性的场景中显著提升了鲁棒性(故障率降低超过38%)。据我们所知,这是首个在自动驾驶中结合模仿学习与强化学习、并利用大规模真实世界人类驾驶数据的研究应用。