It is a long-term vision for Autonomous Driving (AD) community that the perception models can learn from a large-scale point cloud dataset, to obtain unified representations that can achieve promising results on different tasks or benchmarks. Previous works mainly focus on the self-supervised pre-training pipeline, meaning that they perform the pre-training and fine-tuning on the same benchmark, which is difficult to attain the performance scalability and cross-dataset application for the pre-training checkpoint. In this paper, for the first time, we are committed to building a large-scale pre-training point-cloud dataset with diverse data distribution, and meanwhile learning generalizable representations from such a diverse pre-training dataset. We formulate the point-cloud pre-training task as a semi-supervised problem, which leverages the few-shot labeled and massive unlabeled point-cloud data to generate the unified backbone representations that can be directly applied to many baseline models and benchmarks, decoupling the AD-related pre-training process and downstream fine-tuning task. During the period of backbone pre-training, by enhancing the scene- and instance-level distribution diversity and exploiting the backbone's ability to learn from unknown instances, we achieve significant performance gains on a series of downstream perception benchmarks including Waymo, nuScenes, and KITTI, under different baseline models like PV-RCNN++, SECOND, CenterPoint.
翻译:对于自主驾驶社区而言,使感知模型能够从大规模点云数据集中进行学习,获得可在不同任务或基准上取得优异结果的统一表征,是一项长期愿景。以往的工作主要聚焦于自监督预训练流程,即在同一基准上进行预训练和微调,这难以实现预训练检查点的性能可扩展性和跨数据集应用。本文首次致力于构建一个具有多样化数据分布的大规模预训练点云数据集,同时从这样一个多样化的预训练数据集中学习可泛化的表征。我们将点云预训练任务形式化为一个半监督问题,利用少量标注和大量无标注的点云数据生成统一的骨干网络表征,可直接应用于多种基线模型和基准数据集,从而将自主驾驶相关的预训练过程与下游微调任务解耦。在骨干网络预训练期间,通过增强场景级和实例级分布多样性,并挖掘骨干网络从未知实例中学习的能力,我们在Waymo、nuScenes和KITTI等一系列下游感知基准上,以及在PV-RCNN++、SECOND、CenterPoint等不同基线模型下,均取得了显著的性能提升。