Annotating 3D LiDAR point clouds for perception tasks including 3D object detection and LiDAR semantic segmentation is notoriously time-and-energy-consuming. To alleviate the burden from labeling, it is promising to perform large-scale pre-training and fine-tune the pre-trained backbone on different downstream datasets as well as tasks. In this paper, we propose SPOT, namely Scalable Pre-training via Occupancy prediction for learning Transferable 3D representations, and demonstrate its effectiveness on various public datasets with different downstream tasks under the label-efficiency setting. Our contributions are threefold: (1) Occupancy prediction is shown to be promising for learning general representations, which is demonstrated by extensive experiments on plenty of datasets and tasks. (2) SPOT uses beam re-sampling technique for point cloud augmentation and applies class-balancing strategies to overcome the domain gap brought by various LiDAR sensors and annotation strategies in different datasets. (3) Scalable pre-training is observed, that is, the downstream performance across all the experiments gets better with more pre-training data. We believe that our findings can facilitate understanding of LiDAR point clouds and pave the way for future exploration in LiDAR pre-training. Codes and models will be released.
翻译:标注用于三维目标检测和激光雷达语义分割等感知任务的3D LiDAR点云数据通常耗时耗能。为减轻标注负担,在大规模预训练后将预训练骨干网络迁移至不同下游数据集和任务中进行微调具有广阔前景。本文提出SPOT(即通过占用预测学习可迁移三维表示的可扩展预训练方法),并在标签高效设置下,基于多个公开数据集的不同下游任务验证其有效性。我们的贡献包含三方面:(1)大量跨数据集和跨任务的实验表明,占用预测在学习通用表示方面具有显著潜力;(2)SPOT采用波束重采样技术进行点云增强,并应用类别平衡策略克服不同数据集因激光雷达传感器差异和标注策略差异带来的领域差距;(3)观察到可扩展预训练现象,即随着预训练数据量增加,所有实验的下游性能均持续提升。我们相信这些发现将促进对LiDAR点云的理解,并为未来LiDAR预训练研究铺平道路。相关代码和模型将开源发布。