End-to-end autonomous driving has garnered widespread attention. Current end-to-end approaches largely rely on the supervision from perception tasks such as detection, tracking, and map segmentation to aid in learning scene representations. However, these methods require extensive annotations, hindering the data scalability. To address this challenge, we propose a novel self-supervised method to enhance end-to-end driving without the need for costly labels. Specifically, our framework \textbf{LAW} uses a LAtent World model to predict future latent features based on the predicted ego actions and the latent feature of the current frame. The predicted latent features are supervised by the actually observed features in the future. This supervision jointly optimizes the latent feature learning and action prediction, which greatly enhances the driving performance. As a result, our approach achieves state-of-the-art performance in both open-loop and closed-loop benchmarks without costly annotations.
翻译:端到端自动驾驶已获得广泛关注。当前端到端方法主要依赖检测、跟踪与地图分割等感知任务的监督来辅助学习场景表征。然而,这些方法需要大量标注,制约了数据可扩展性。为应对这一挑战,我们提出一种无需昂贵标注的新型自监督方法以增强端到端驾驶性能。具体而言,我们的框架 \textbf{LAW} 通过隐式世界模型,基于预测的自身动作与当前帧的隐式特征来预测未来隐式特征。预测的隐式特征将受到未来实际观测特征的监督。该监督机制联合优化了隐式特征学习与动作预测,从而显著提升了驾驶性能。实验表明,我们的方法在开环与闭环基准测试中均取得了最先进的性能,且无需昂贵标注。