Generating human-like and adaptive trajectories is essential for autonomous driving in dynamic environments. While generative models have shown promise in synthesizing feasible trajectories, they often fail to capture the nuanced variability of human driving styles due to dataset biases and distributional shifts. To address this, we introduce TrajHF, a human feedback-driven finetuning framework for generative trajectory models, designed to align motion planning with diverse driving preferences. TrajHF incorporates multi-conditional denoiser and reinforcement learning with human feedback to refine multi-modal trajectory generation beyond conventional imitation learning. This enables better alignment with human driving preferences while maintaining safety and feasibility constraints. TrajHF achieves PDMS of 93.95 on NavSim benchmark, significantly exceeding other methods. TrajHF sets a new paradigm for personalized and adaptable trajectory generation in autonomous driving.
翻译:在动态环境中生成类人且自适应的轨迹对于自动驾驶至关重要。虽然生成模型在合成可行轨迹方面展现出潜力,但由于数据集偏差和分布偏移,它们往往难以捕捉人类驾驶风格的细微变化。为此,我们提出TrajHF——一种基于人类反馈驱动的生成式轨迹模型微调框架,旨在使运动规划与多样化的驾驶偏好对齐。TrajHF融合了多条件去噪器和基于人类反馈的强化学习,以超越传统模仿学习的方式优化多模态轨迹生成。该方法在保持安全性和可行性约束的同时,能更好地与人类驾驶偏好对齐。在NavSim基准测试中,TrajHF实现了93.95的PDMS分数,显著优于其他方法。TrajHF为自动驾驶中的个性化和自适应轨迹生成确立了新范式。