SparseDrive: End-to-End Autonomous Driving via Sparse Scene Representation

The well-established modular autonomous driving system is decoupled into different standalone tasks, e.g. perception, prediction and planning, suffering from information loss and error accumulation across modules. In contrast, end-to-end paradigms unify multi-tasks into a fully differentiable framework, allowing for optimization in a planning-oriented spirit. Despite the great potential of end-to-end paradigms, both the performance and efficiency of existing methods are not satisfactory, particularly in terms of planning safety. We attribute this to the computationally expensive BEV (bird's eye view) features and the straightforward design for prediction and planning. To this end, we explore the sparse representation and review the task design for end-to-end autonomous driving, proposing a new paradigm named SparseDrive. Concretely, SparseDrive consists of a symmetric sparse perception module and a parallel motion planner. The sparse perception module unifies detection, tracking and online mapping with a symmetric model architecture, learning a fully sparse representation of the driving scene. For motion prediction and planning, we review the great similarity between these two tasks, leading to a parallel design for motion planner. Based on this parallel design, which models planning as a multi-modal problem, we propose a hierarchical planning selection strategy , which incorporates a collision-aware rescore module, to select a rational and safe trajectory as the final planning output. With such effective designs, SparseDrive surpasses previous state-of-the-arts by a large margin in performance of all tasks, while achieving much higher training and inference efficiency. Code will be avaliable at https://github.com/swc-17/SparseDrive for facilitating future research.

翻译：成熟的模块化自动驾驶系统被解耦为不同的独立任务，例如感知、预测与规划，这导致模块间存在信息损失与误差累积。相比之下，端到端范式将多任务统一至一个完全可微的框架中，从而能够以规划为导向进行优化。尽管端到端范式潜力巨大，但现有方法在性能与效率方面均不尽如人意，尤其在规划安全性方面。我们将此归因于计算代价高昂的鸟瞰图特征以及预测与规划任务的直接设计。为此，我们探索稀疏表示方法并重新审视端到端自动驾驶的任务设计，提出了一种名为SparseDrive的新范式。具体而言，SparseDrive由对称稀疏感知模块与并行运动规划器构成。稀疏感知模块通过对称模型架构统一了检测、跟踪与在线建图任务，学习驾驶场景的完全稀疏表示。针对运动预测与规划，我们重新审视了两者间的高度相似性，从而设计了并行运动规划器。基于这种将规划建模为多模态问题的并行设计，我们提出了一种分层规划选择策略，该策略结合了碰撞感知重评分模块，以选择合理且安全的轨迹作为最终规划输出。凭借这些高效设计，SparseDrive在所有任务性能上均大幅超越先前最优方法，同时实现了更高的训练与推理效率。代码将在https://github.com/swc-17/SparseDrive发布，以促进后续研究。