Moving object segmentation (MOS) on LiDAR point clouds is crucial for autonomous systems like self-driving vehicles. Previous supervised approaches rely heavily on costly manual annotations, while LiDAR sequences naturally capture temporal motion cues that can be leveraged for self-supervised learning. In this paper, we propose Temporal Overlapping Prediction (TOP), a self-supervised pre-training method that alleviate the labeling burden for MOS. TOP explores the temporal overlapping points that commonly observed by current and adjacent scans, and learns spatiotemporal representations by predicting the occupancy states of temporal overlapping points. Moreover, we utilize current occupancy reconstruction as an auxiliary pre-training objective, which enhances the current structural awareness of the model. We conduct extensive experiments and observe that the conventional metric Intersection-over-Union (IoU) shows strong bias to objects with more scanned points, which might neglect small or distant objects. To compensate for this bias, we introduce an additional metric called mIoU_obj to evaluate object-level performance. Experiments on nuScenes and SemanticKITTI show that TOPoutperforms both supervised training-from-scratch baseline and other self-supervised pre-training baselines by up to 28.77% relative improvement, demonstrating strong transferability across LiDAR setups and generalization to other tasks. Code and pre-trained models will be publicly available upon publication.
翻译:激光雷达点云中的运动目标分割(MOS)对于自动驾驶车辆等自主系统至关重要。以往的监督方法严重依赖昂贵的人工标注,而激光雷达序列天然捕获了可用于自监督学习的时间运动线索。本文提出时序重叠预测(TOP),一种自监督预训练方法,以减轻MOS的标注负担。TOP探索当前扫描与相邻扫描通常观测到的时序重叠点,并通过预测时序重叠点的占据状态来学习时空表征。此外,我们利用当前占据重建作为辅助预训练目标,以增强模型对当前结构的感知能力。我们进行了大量实验,观察到传统指标交并比(IoU)对具有更多扫描点的物体表现出强烈偏差,可能忽略小型或遥远物体。为弥补此偏差,我们引入了一个称为mIoU_obj的额外指标来评估物体级性能。在nuScenes和SemanticKITTI数据集上的实验表明,TOP优于监督从头训练基线及其他自监督预训练基线,相对提升高达28.77%,证明了其在激光雷达设置间的强可迁移性及对其他任务的泛化能力。代码与预训练模型将在发表后公开提供。