Autonomous driving involves complex decision-making in highly interactive environments, requiring thoughtful negotiation with other traffic participants. While reinforcement learning provides a way to learn such interaction behavior, efficient learning critically depends on scalable state representations. Contrary to imitation learning methods, high-dimensional state representations still constitute a major bottleneck for deep reinforcement learning methods in autonomous driving. In this paper, we study the challenges of constructing bird's-eye-view representations for autonomous driving and propose a recurrent learning architecture for long-horizon driving. Our PPO-based approach, called RecurrDriveNet, is demonstrated on a simulated autonomous driving task in CARLA, where it outperforms traditional frame-stacking methods while only requiring one million experiences for efficient training. RecurrDriveNet causes less than one infraction per driven kilometer by interacting safely with other road users.
翻译:自动驾驶涉及在高度交互环境中的复杂决策,需要与其他交通参与者进行细致的协商。虽然强化学习为学习此类交互行为提供了途径,但其高效性关键取决于可扩展的状态表示。与模仿学习方法不同,高维状态表示仍然是自动驾驶中深度强化学习方法的主要瓶颈。本文研究了构建自动驾驶鸟瞰表示的挑战,并提出了一种用于长时域驾驶的循环学习架构。我们所提出的基于PPO的方法,称为RecurrDriveNet,在CARLA中的模拟自动驾驶任务上进行了验证,其性能优于传统的帧堆叠方法,同时仅需一百万次经验即可实现高效训练。通过与其他道路使用者安全交互,RecurrDriveNet每驾驶公里造成的违规次数少于一次。