Efficient data utilization is crucial for advancing 3D scene understanding in autonomous driving, where reliance on heavily human-annotated LiDAR point clouds challenges fully supervised methods. Addressing this, our study extends into semi-supervised learning for LiDAR semantic segmentation, leveraging the intrinsic spatial priors of driving scenes and multi-sensor complements to augment the efficacy of unlabeled datasets. We introduce LaserMix++, an evolved framework that integrates laser beam manipulations from disparate LiDAR scans and incorporates LiDAR-camera correspondences to further assist data-efficient learning. Our framework is tailored to enhance 3D scene consistency regularization by incorporating multi-modality, including 1) multi-modal LaserMix operation for fine-grained cross-sensor interactions; 2) camera-to-LiDAR feature distillation that enhances LiDAR feature learning; and 3) language-driven knowledge guidance generating auxiliary supervisions using open-vocabulary models. The versatility of LaserMix++ enables applications across LiDAR representations, establishing it as a universally applicable solution. Our framework is rigorously validated through theoretical analysis and extensive experiments on popular driving perception datasets. Results demonstrate that LaserMix++ markedly outperforms fully supervised alternatives, achieving comparable accuracy with five times fewer annotations and significantly improving the supervised-only baselines. This substantial advancement underscores the potential of semi-supervised approaches in reducing the reliance on extensive labeled data in LiDAR-based 3D scene understanding systems.
翻译:高效的数据利用对于推动自动驾驶中的三维场景理解至关重要,其中对大量人工标注的LiDAR点云的依赖给全监督方法带来了挑战。针对这一问题,本研究将半监督学习拓展至LiDAR语义分割领域,利用驾驶场景固有的空间先验和多传感器互补性,以提升未标记数据集的利用效能。我们提出了LaserMix++,这是一个进化的框架,它整合了来自不同LiDAR扫描的激光束操作,并融入了LiDAR-相机对应关系,以进一步辅助数据高效学习。我们的框架旨在通过融入多模态技术来增强三维场景一致性正则化,具体包括:1)用于细粒度跨传感器交互的多模态LaserMix操作;2)增强LiDAR特征学习的相机到LiDAR特征蒸馏;3)利用开放词汇模型生成辅助监督的语言驱动知识引导。LaserMix++的通用性使其能够适用于各种LiDAR表示,从而成为一个普遍适用的解决方案。我们的框架通过理论分析和在主流驾驶感知数据集上的广泛实验得到了严格验证。结果表明,LaserMix++显著优于全监督方法,在仅使用五分之一标注量的情况下达到了相当的精度,并大幅超越了纯监督基线。这一重大进展凸显了半监督方法在减少基于LiDAR的三维场景理解系统对大量标注数据依赖方面的潜力。