Vision-based autonomous driving shows great potential due to its satisfactory performance and low costs. Most existing methods adopt dense representations (e.g., bird's eye view) or sparse representations (e.g., instance boxes) for decision-making, which suffer from the trade-off between comprehensiveness and efficiency. This paper explores a Gaussian-centric end-to-end autonomous driving (GaussianAD) framework and exploits 3D semantic Gaussians to extensively yet sparsely describe the scene. We initialize the scene with uniform 3D Gaussians and use surrounding-view images to progressively refine them to obtain the 3D Gaussian scene representation. We then use sparse convolutions to efficiently perform 3D perception (e.g., 3D detection, semantic map construction). We predict 3D flows for the Gaussians with dynamic semantics and plan the ego trajectory accordingly with an objective of future scene forecasting. Our GaussianAD can be trained in an end-to-end manner with optional perception labels when available. Extensive experiments on the widely used nuScenes dataset verify the effectiveness of our end-to-end GaussianAD on various tasks including motion planning, 3D occupancy prediction, and 4D occupancy forecasting. Code: https://github.com/wzzheng/GaussianAD.
翻译:基于视觉的自动驾驶因其优异的性能和较低的成本展现出巨大潜力。现有方法大多采用密集表示(如鸟瞰图)或稀疏表示(如实例边界框)进行决策,需要在全面性与效率之间进行权衡。本文探索了一种以高斯分布为核心的端到端自动驾驶(GaussianAD)框架,利用三维语义高斯分布对场景进行广泛而稀疏的描述。我们通过均匀三维高斯分布初始化场景,并利用环视图像逐步优化以获得三维高斯场景表示。随后采用稀疏卷积高效执行三维感知任务(如三维检测、语义地图构建)。我们为具有动态语义的高斯分布预测三维流场,并基于未来场景预测目标规划自车轨迹。GaussianAD可在感知标签可用时以端到端方式进行训练。在广泛使用的nuScenes数据集上的大量实验验证了本端到端GaussianAD在运动规划、三维占据预测和四维占据预测等多种任务上的有效性。代码:https://github.com/wzzheng/GaussianAD。