The sensing process of large-scale LiDAR point clouds inevitably causes large blind spots, i.e. regions not visible to the sensor. We demonstrate how these inherent sampling properties can be effectively utilized for self-supervised representation learning by designing a highly effective pre-training framework that considerably reduces the need for tedious 3D annotations to train state-of-the-art object detectors. Our Masked AutoEncoder for LiDAR point clouds (MAELi) intuitively leverages the sparsity of LiDAR point clouds in both the encoder and decoder during reconstruction. This results in more expressive and useful initialization, which can be directly applied to downstream perception tasks, such as 3D object detection or semantic segmentation for autonomous driving. In a novel reconstruction approach, MAELi distinguishes between empty and occluded space and employs a new masking strategy that targets the LiDAR's inherent spherical projection. Thereby, without any ground truth whatsoever and trained on single frames only, MAELi obtains an understanding of the underlying 3D scene geometry and semantics. To demonstrate the potential of MAELi, we pre-train backbones in an end-to-end manner and show the effectiveness of our unsupervised pre-trained weights on the tasks of 3D object detection and semantic segmentation.
翻译:大规模LiDAR点云的传感过程不可避免地会产生较大盲区,即传感器无法观测的区域。我们通过设计一种高效的预训练框架,展示了如何有效利用这些固有的采样特性进行自监督表示学习,从而大幅减少训练先进目标检测器所需的大量繁琐3D标注。所提出的LiDAR点云掩码自编码器(MAELi)在编码器和解码器的重建过程中直观地利用了LiDAR点云的稀疏性,从而获得更具表达性和实用性的初始化参数,可直接应用于下游感知任务(如自动驾驶中的3D目标检测或语义分割)。在一种新颖的重建方法中,MAELi能够区分空域与遮挡空间,并采用针对LiDAR固有球面投影的新型掩码策略。因此,无需任何真实标注且仅基于单帧数据训练,MAELi即可理解底层3D场景的几何结构与语义信息。为展示MAELi的潜力,我们以端到端方式预训练骨干网络,并验证了无监督预训练权重在3D目标检测与语义分割任务中的有效性。