CLOVER: Closed-Loop Value Estimation \& Ranking for End-to-End Autonomous Driving Planning

End-to-end autonomous driving planners are commonly trained by imitating a single logged trajectory, yet evaluated by rule-based planning metrics that measure safety, feasibility, progress, and comfort. This creates a training--evaluation mismatch: trajectories close to the logged path may violate planning rules, while alternatives farther from the demonstration can remain valid and high-scoring. The mismatch is especially limiting for proposal-selection planners, whose performance depends on candidate-set coverage and scorer ranking quality. We propose CLOVER, a Closed-LOop Value Estimation and Ranking framework for end-to-end autonomous driving planning. CLOVER follows a lightweight generator--scorer formulation: a generator produces diverse candidate trajectories, and a scorer predicts planning-metric sub-scores to rank them at inference time. To expand proposal support beyond single-trajectory imitation, CLOVER constructs evaluator-filtered pseudo-expert trajectories and trains the generator with set-level coverage supervision. It then performs conservative closed-loop self-distillation: the scorer is fitted to true evaluator sub-scores on generated proposals, while the generator is refined toward teacher-selected top-$k$ and vector-Pareto targets with stability regularization. We analyze when an imperfect scorer can improve the generator, showing that scorer-mediated refinement is reliable when scorer-selected targets are enriched under the true evaluator and updates remain conservative. On NAVSIM, CLOVER achieves 94.5 PDMS and 90.4 EPDMS, establishing a new state of the art. On the more challenging NavHard split, it obtains 48.3 EPDMS, matching the strongest reported result. On supplementary nuScenes open-loop evaluation, CLOVER achieves the lowest L2 error and collision rate among compared methods. Code data will be released at https://github.com/WilliamXuanYu/CLOVER.

翻译：端到端自动驾驶规划器通常通过模仿单一记录轨迹进行训练，但评估时却采用基于规则的规划指标（衡量安全性、可行性、进展和舒适度）。这导致了训练与评估的不匹配：接近记录路径的轨迹可能违反规划规则，而偏离演示轨迹的替代方案却可能仍有效且得分更高。这种不匹配对于候选-选择式规划器尤为限制，因其性能取决于候选集覆盖率和评分器排序质量。我们提出CLOVER——一种面向端到端自动驾驶规划的闭环价值估计与排序框架。CLOVER采用轻量级生成器-评分器架构：生成器生成多样化候选轨迹，评分器预测规划指标子分数以在推理时对其进行排序。为突破单一轨迹模仿对候选支持集的限制，CLOVER构建评估器过滤后的伪专家轨迹，并通过集合级覆盖监督训练生成器。随后执行保守闭环自蒸馏：评分器拟合生成候选轨迹的真实评估器子分数，同时生成器向教师选择的top-k和向量帕累托目标优化，并辅以稳定性正则化。我们分析了非完美评分器如何改进生成器，证明当评分器选择的目标在真实评估器下富含信息且更新保持保守时，评分器介导的优化是可靠的。在NAVSIM上，CLOVER获得94.5 PDMS和90.4 EPDMS，刷新了最佳性能记录。在更具挑战性的NavHard数据划分上，CLOVER取得48.3 EPDMS，与已报道的最强结果持平。在辅助的nuScenes开环评估中，CLOVER相较对比方法实现了最低的L2误差和碰撞率。代码和数据将在https://github.com/WilliamXuanYu/CLOVER发布。