Autonomous driving involves complex decision-making in highly interactive environments, requiring thoughtful negotiation with other traffic participants. While reinforcement learning provides a way to learn such interaction behavior, efficient learning critically depends on scalable state representations. Contrary to imitation learning methods, high-dimensional state representations still constitute a major bottleneck for deep reinforcement learning methods in autonomous driving. In this paper, we study the challenges of constructing bird's-eye-view representations for autonomous driving and propose a recurrent learning architecture for long-horizon driving. Our PPO-based approach, called RecurrDriveNet, is demonstrated on a simulated autonomous driving task in CARLA, where it outperforms traditional frame-stacking methods while only requiring one million experiences for training. RecurrDriveNet causes less than one infraction per driven kilometer by interacting safely with other road users.
翻译:自动驾驶在高度交互的环境中涉及复杂的决策制定,需要与其他交通参与者进行深思熟虑的协商。尽管强化学习提供了一种学习此类交互行为的方法,但其高效学习在很大程度上取决于可扩展的状态表示。与模仿学习方法相反,高维状态表示仍然是自动驾驶中深度强化学习方法的主要瓶颈。在本文中,我们研究了构建自动驾驶鸟瞰表示的挑战,并提出了一种用于长时域驾驶的循环学习架构。我们基于PPO的方法称为RecurrDriveNet,在CARLA中的模拟自动驾驶任务上进行了验证,该方法仅需一百万次经验训练即可超越传统的帧堆叠方法。RecurrDriveNet通过与其他道路使用者安全交互,每驾驶公里造成的违规次数少于一次。