Learning-based vehicle planning is receiving increasing attention with the emergence of diverse driving simulators and large-scale driving datasets. While offline reinforcement learning (RL) is well suited for these safety-critical tasks, it still struggles to plan over extended periods. In this work, we present a skill-based framework that enhances offline RL to overcome the long-horizon vehicle planning challenge. Specifically, we design a variational autoencoder (VAE) to learn skills from offline demonstrations. To mitigate posterior collapse of common VAEs, we introduce a two-branch sequence encoder to capture both discrete options and continuous variations of the complex driving skills. The final policy treats learned skills as actions and can be trained by any off-the-shelf offline RL algorithms. This facilitates a shift in focus from per-step actions to temporally extended skills, thereby enabling long-term reasoning into the future. Extensive results on CARLA prove that our model consistently outperforms strong baselines at both training and new scenarios. Additional visualizations and experiments demonstrate the interpretability and transferability of extracted skills.
翻译:随着多样化驾驶模拟器和大规模驾驶数据集的出现,基于学习的车辆规划正受到越来越多的关注。尽管离线强化学习非常适合这些安全关键任务,但在长时间跨度规划方面仍面临挑战。本文提出了一种基于技能框架的方法,以增强离线强化学习应对长期车辆规划难题的能力。具体而言,我们设计了一个变分自编码器,用于从离线演示中学习技能。为缓解常见变分自编码器的后验坍塌问题,我们引入了一种双分支序列编码器,以同时捕捉复杂驾驶技能中的离散选项和连续变化。最终策略将学习到的技能视为动作,并可借助任意现成的离线强化学习算法进行训练。这有助于将关注点从逐步骤动作转向时间上扩展的技能,从而实现面向未来的长期推理。在CARLA上的大量实验证明,我们的模型在训练场景和新场景中均持续优于强基线方法。额外的可视化与实验表明,提取的技能具有良好的可解释性和可迁移性。