Trajectory planning is vital for autonomous driving, ensuring safe and efficient navigation in complex environments. While recent learning-based methods, particularly reinforcement learning (RL), have shown promise in specific scenarios, RL planners struggle with training inefficiencies and managing large-scale, real-world driving scenarios. In this paper, we introduce \textbf{CarPlanner}, a \textbf{C}onsistent \textbf{a}uto-\textbf{r}egressive \textbf{Planner} that uses RL to generate multi-modal trajectories. The auto-regressive structure enables efficient large-scale RL training, while the incorporation of consistency ensures stable policy learning by maintaining coherent temporal consistency across time steps. Moreover, CarPlanner employs a generation-selection framework with an expert-guided reward function and an invariant-view module, simplifying RL training and enhancing policy performance. Extensive analysis demonstrates that our proposed RL framework effectively addresses the challenges of training efficiency and performance enhancement, positioning CarPlanner as a promising solution for trajectory planning in autonomous driving. To the best of our knowledge, we are the first to demonstrate that the RL-based planner can surpass both IL- and rule-based state-of-the-arts (SOTAs) on the challenging large-scale real-world dataset nuPlan. Our proposed CarPlanner surpasses RL-, IL-, and rule-based SOTA approaches within this demanding dataset.
翻译:轨迹规划对于自动驾驶至关重要,其确保在复杂环境中实现安全高效的导航。尽管近期基于学习的方法,特别是强化学习(RL),在特定场景中展现出潜力,但基于RL的规划器仍面临训练效率低下以及难以处理大规模真实世界驾驶场景的挑战。本文提出 \textbf{CarPlanner},一种基于强化学习的、用于生成多模态轨迹的 \textbf{一致性自回归规划器}。其自回归结构实现了高效的大规模RL训练,而一致性机制的引入则通过保持跨时间步长的连贯时序一致性,确保了策略学习的稳定性。此外,CarPlanner采用了一种生成-选择框架,该框架包含专家引导的奖励函数和不变视角模块,从而简化了RL训练并提升了策略性能。大量分析表明,我们提出的RL框架有效解决了训练效率和性能提升的挑战,使CarPlanner成为自动驾驶轨迹规划领域一个有前景的解决方案。据我们所知,我们首次证明了基于RL的规划器能够在具有挑战性的大规模真实世界数据集nuPlan上超越基于模仿学习(IL)和基于规则的最先进方法(SOTA)。我们提出的CarPlanner在该高要求数据集上超越了基于RL、IL和规则的SOTA方法。