Diffusion-based trajectory planners have demonstrated strong capability for modeling the multimodal nature of human driving behavior, but their reliance on iterative stochastic sampling poses critical challenges for real-time, safety-critical deployment. In this work, we present RAPiD, a deterministic policy extraction framework that distills a pretrained diffusion-based planner into an efficient policy while eliminating diffusion sampling. Using score-regularized policy optimization, we leverage the score function of a pre-trained diffusion planner as a behavior prior to regularize policy learning. To promote safety and passenger comfort, the policy is optimized using a critic trained to imitate a predictive driver controller, providing dense, safety-focused supervision beyond conventional imitation learning. Evaluations demonstrate that RAPiD achieves competitive performance on closed-loop nuPlan scenarios with an 8x speedup over diffusion baselines, while achieving state-of-the-art generalization among learning-based planners on the interPlan benchmark. The official website of this work is: https://github.com/ruturajreddy/RAPiD.
翻译:基于扩散的轨迹规划器在建模人类驾驶行为的多模态特性方面展现出强大能力,但其对迭代随机采样的依赖为实时性安全关键部署带来了严峻挑战。本研究提出RAPiD,一种确定性策略提取框架,通过将预训练的扩散式规划器蒸馏为高效策略,同时消除扩散采样过程。采用分数正则化策略优化方法,我们利用预训练扩散规划器的评分函数作为行为先验来正则化策略学习。为提升安全性与乘客舒适度,策略通过模仿预测性驾驶员控制器的评论家网络进行优化,提供超越传统模仿学习的密集安全导向监督。评估结果表明,RAPiD在闭环nuPlan场景中取得具有竞争力的性能,相比扩散基线实现8倍加速,同时在interPlan基准测试中达到基于学习的规划器中最先进的泛化能力。本项目官方网站为:https://github.com/ruturajreddy/RAPiD。