In this paper, we propose HE-Drive: the first human-like-centric end-to-end autonomous driving system to generate trajectories that are both temporally consistent and comfortable. Recent studies have shown that imitation learning-based planners and learning-based trajectory scorers can effectively generate and select accuracy trajectories that closely mimic expert demonstrations. However, such trajectory planners and scorers face the dilemma of generating temporally inconsistent and uncomfortable trajectories. To solve the above problems, Our HE-Drive first extracts key 3D spatial representations through sparse perception, which then serves as conditional inputs for a Conditional Denoising Diffusion Probabilistic Models (DDPMs)-based motion planner to generate temporal consistency multi-modal trajectories. A Vision-Language Models (VLMs)-guided trajectory scorer subsequently selects the most comfortable trajectory from these candidates to control the vehicle, ensuring human-like end-to-end driving. Experiments show that HE-Drive not only achieves state-of-the-art performance (i.e., reduces the average collision rate by 71% than VAD) and efficiency (i.e., 1.9X faster than SparseDrive) on the challenging nuScenes and OpenScene datasets but also provides the most comfortable driving experience on real-world data.For more information, visit the project website: https://jmwang0117.github.io/HE-Drive/.
翻译:本文提出HE-Drive:首个以人类驾驶行为为中心的端到端自动驾驶系统,能够生成兼具时序一致性与舒适性的轨迹。近期研究表明,基于模仿学习的规划器与基于学习的轨迹评分器能有效生成并筛选出精准模拟专家示范的轨迹。然而,此类轨迹规划器与评分器面临生成时序不一致及不舒适轨迹的困境。为解决上述问题,HE-Drive首先通过稀疏感知提取关键三维空间表征,并将其作为条件输入至基于条件去噪扩散概率模型(DDPMs)的运动规划器,以生成时序一致的多模态轨迹。随后,由视觉语言模型(VLMs)引导的轨迹评分器从候选轨迹中选取最舒适的轨迹控制车辆,实现人类化端到端驾驶。实验表明,HE-Drive不仅在极具挑战性的nuScenes和OpenScene数据集上实现了最先进的性能(即平均碰撞率较VAD降低71%)与效率(即比SparseDrive快1.9倍),同时在实际道路数据中提供了最舒适的驾驶体验。更多信息请访问项目网站:https://jmwang0117.github.io/HE-Drive/。