End-to-End Autonomous Driving without Costly Modularization and 3D Manual Annotation

We propose UAD, a method for vision-based end-to-end autonomous driving (E2EAD), achieving the best open-loop evaluation performance in nuScenes, meanwhile showing robust closed-loop driving quality in CARLA. Our motivation stems from the observation that current E2EAD models still mimic the modular architecture in typical driving stacks, with carefully designed supervised perception and prediction subtasks to provide environment information for oriented planning. Although achieving groundbreaking progress, such design has certain drawbacks: 1) preceding subtasks require massive high-quality 3D annotations as supervision, posing a significant impediment to scaling the training data; 2) each submodule entails substantial computation overhead in both training and inference. To this end, we propose UAD, an E2EAD framework with an unsupervised proxy to address all these issues. Firstly, we design a novel Angular Perception Pretext to eliminate the annotation requirement. The pretext models the driving scene by predicting the angular-wise spatial objectness and temporal dynamics, without manual annotation. Secondly, a self-supervised training strategy, which learns the consistency of the predicted trajectories under different augment views, is proposed to enhance the planning robustness in steering scenarios. Our UAD achieves 38.7% relative improvements over UniAD on the average collision rate in nuScenes and surpasses VAD for 41.32 points on the driving score in CARLA's Town05 Long benchmark. Moreover, the proposed method only consumes 44.3% training resources of UniAD and runs 3.4 times faster in inference. Our innovative design not only for the first time demonstrates unarguable performance advantages over supervised counterparts, but also enjoys unprecedented efficiency in data, training, and inference. Code and models will be released at https://github.com/KargoBot_Research/UAD.

翻译：我们提出UAD，一种基于视觉的端到端自动驾驶方法，在nuScenes数据集上取得了最优的开环评估性能，同时在CARLA中展现出鲁棒的闭环驾驶质量。我们的动机源于观察到当前端到端自动驾驶模型仍在模仿典型驾驶栈的模块化架构，通过精心设计的有监督感知与预测子任务为面向规划提供环境信息。尽管取得了突破性进展，此类设计存在固有缺陷：1) 前置子任务需要海量高质量三维标注作为监督，构成训练数据扩展的重大障碍；2) 每个子模块在训练和推理阶段均产生可观计算开销。为此，我们提出UAD——一个通过无监督代理解决上述所有问题的端到端自动驾驶框架。首先，我们设计了一种新颖的角度感知前置任务以消除标注需求。该前置任务通过预测角度空间物体性与时间动态性来建模驾驶场景，无需人工标注。其次，我们提出一种自监督训练策略，通过在不同增强视角下学习预测轨迹的一致性，以提升转向场景中的规划鲁棒性。我们的UAD在nuScenes数据集上相比UniAD平均碰撞率相对提升38.7%，在CARLA Town05 Long基准测试中驾驶分数超越VAD达41.32分。此外，所提方法仅消耗UniAD 44.3%的训练资源，且推理速度提升3.4倍。我们的创新设计不仅首次展现出相对于有监督方法的无可争议性能优势，同时在数据、训练和推理效率方面实现了前所未有的提升。代码与模型将在https://github.com/KargoBot_Research/UAD发布。