Annotating 3D LiDAR point clouds for perception tasks including 3D object detection and LiDAR semantic segmentation is notoriously time-and-energy-consuming. To alleviate the burden from labeling, it is promising to perform large-scale pre-training and fine-tune the pre-trained backbone on different downstream datasets as well as tasks. In this paper, we propose SPOT, namely Scalable Pre-training via Occupancy prediction for learning Transferable 3D representations, and demonstrate its effectiveness on various public datasets with different downstream tasks under the label-efficiency setting. Our contributions are threefold: (1) Occupancy prediction is shown to be promising for learning general representations, which is demonstrated by extensive experiments on plenty of datasets and tasks. (2) SPOT uses beam re-sampling technique for point cloud augmentation and applies class-balancing strategies to overcome the domain gap brought by various LiDAR sensors and annotation strategies in different datasets. (3) Scalable pre-training is observed, that is, the downstream performance across all the experiments gets better with more pre-training data. We believe that our findings can facilitate understanding of LiDAR point clouds and pave the way for future exploration in LiDAR pre-training. Codes and models will be released.
翻译:为3D激光雷达点云标注感知任务(包括3D目标检测与LiDAR语义分割)极为耗时耗力。为减轻标注负担,开展大规模预训练并在不同下游数据集及任务上微调预训练主干网络具有广阔前景。本文提出SPOT(即可通过占用预测学习可迁移3D表征的可扩展预训练方法),并在标签高效设置下,通过不同下游任务在多个公开数据集上验证其有效性。我们的贡献有三方面:(1)通过大量数据集与任务的充分实验证明,占用预测对学习通用表征具有显著潜力;(2)SPOT采用光束重采样技术进行点云增强,并应用类别均衡策略以克服不同数据集中由LiDAR传感器与标注策略差异带来的域差距;(3)观察到可扩展预训练特性,即所有实验中下游性能随预训练数据量增加而提升。我们相信这些发现可深化对LiDAR点云的理解,并为未来LiDAR预训练研究奠定基础。代码与模型将公开发布。