End-to-end differentiable learning for autonomous driving (AD) has recently become a prominent paradigm. One main bottleneck lies in its voracious appetite for high-quality labeled data e.g. 3D bounding boxes and semantic segmentation, which are notoriously expensive to manually annotate. The difficulty is further pronounced due to the prominent fact that the behaviors within samples in AD often suffer from long tailed distribution. In other words, a large part of collected data can be trivial (e.g. simply driving forward in a straight road) and only a few cases are safety-critical. In this paper, we explore a practically important yet under-explored problem about how to achieve sample and label efficiency for end-to-end AD. Specifically, we design a planning-oriented active learning method which progressively annotates part of collected raw data according to the proposed diversity and usefulness criteria for planning routes. Empirically, we show that our planning-oriented approach could outperform general active learning methods by a large margin. Notably, our method achieves comparable performance with state-of-the-art end-to-end AD methods - by using only 30% nuScenes data. We hope our work could inspire future works to explore end-to-end AD from a data-centric perspective in addition to methodology efforts.
翻译:端到端可微学习在自动驾驶领域近期已成为一个显著的研究范式。其主要瓶颈在于对高质量标注数据(如三维边界框和语义分割)的巨大需求,而这些数据的人工标注成本极高。这一问题因自动驾驶样本行为常呈现长尾分布而进一步加剧。换言之,大量采集的数据可能具有平凡性(例如在直道上简单前行),仅有少数案例关乎安全关键。本文探讨了一个具有重要实践意义但研究尚不充分的问题:如何在端到端自动驾驶中实现样本与标签效率。具体而言,我们设计了一种面向规划的主动学习方法,根据所提出的规划路径多样性与实用性准则,逐步对部分采集的原始数据进行标注。实验表明,我们的规划导向方法相较于通用主动学习方法具有显著优势。值得注意的是,仅使用30%的nuScenes数据,该方法即达到了与最先进端到端自动驾驶方法相当的性能。我们期望此项工作能启发未来研究从数据为中心的视角出发,与方法论探索并行,进一步推动端到端自动驾驶的发展。