Learning-based vehicle planning is receiving increasing attention with the emergence of diverse driving simulators and large-scale driving datasets. While offline reinforcement learning (RL) is well suited for these safety-critical tasks, it still struggles to plan over extended periods. In this work, we present a skill-based framework that enhances offline RL to overcome the long-horizon vehicle planning challenge. Specifically, we design a variational autoencoder (VAE) to learn skills from offline demonstrations. To mitigate posterior collapse of common VAEs, we introduce a two-branch sequence encoder to capture both discrete options and continuous variations of the complex driving skills. The final policy treats learned skills as actions and can be trained by any off-the-shelf offline RL algorithms. This facilitates a shift in focus from per-step actions to temporally extended skills, thereby enabling long-term reasoning into the future. Extensive results on CARLA prove that our model consistently outperforms strong baselines at both training and new scenarios. Additional visualizations and experiments demonstrate the interpretability and transferability of extracted skills.
翻译:随着多样化驾驶模拟器和大规模驾驶数据集的出现,基于学习的车辆规划正受到越来越多的关注。虽然离线强化学习非常适合这些安全关键任务,但它仍难以在长时间范围内进行规划。在本工作中,我们提出了一种基于技能的框架,通过增强离线强化学习来克服长时域车辆规划挑战。具体而言,我们设计了变分自编码器(VAE),用于从离线演示中学习技能。为了缓解常见VAE的后验坍缩问题,我们引入了一种双分支序列编码器,以同时捕捉复杂驾驶技能中的离散选项和连续变化。最终的策略将学习到的技能视为动作,并可通过任何现成的离线强化学习算法进行训练。这促进了从逐歩动作到时间扩展技能的关注点转移,从而实现对未来的长期推理。在CARLA上的大量结果表明,我们的模型在训练场景和新场景中均持续优于强基线方法。额外的可视化与实验证明了所提取技能的可解释性与可迁移性。