Transfer learning is a proven technique in 2D computer vision to leverage the large amount of data available and achieve high performance with datasets limited in size due to the cost of acquisition or annotation. In 3D, annotation is known to be a costly task; nevertheless, pre-training methods have only recently been investigated. Due to this cost, unsupervised pre-training has been heavily favored. In this work, we tackle the case of real-time 3D semantic segmentation of sparse autonomous driving LiDAR scans. Such datasets have been increasingly released, but each has a unique label set. We propose here an intermediate-level label set called coarse labels, which can easily be used on any existing and future autonomous driving datasets, thus allowing all the data available to be leveraged at once without any additional manual labeling. This way, we have access to a larger dataset, alongside a simple task of semantic segmentation. With it, we introduce a new pre-training task: coarse label pre-training, also called COLA. We thoroughly analyze the impact of COLA on various datasets and architectures and show that it yields a noticeable performance improvement, especially when only a small dataset is available for the finetuning task.
翻译:摘要:迁移学习是二维计算机视觉中一项成熟的技术,能够利用海量可用数据,在因采集或标注成本导致规模有限的数据集上实现高性能。在三维领域,标注已被证实为高成本任务,但预训练方法直到近期才被展开研究。由于这一成本限制,无监督预训练受到广泛青睐。本文针对稀疏自动驾驶LiDAR扫描数据的实时三维语义分割问题展开研究。此类数据集虽已陆续公开,但各数据集均采用独特的标签体系。我们提出一种名为"粗标签"的中间层级标签集合,该标签可便捷地应用于现有及未来的自动驾驶数据集,从而无需额外人工标注即可一次性利用所有可用数据。通过这种方式,我们获得了更大规模的数据集及简单的语义分割任务。基于此,我们引入一项新的预训练任务:粗标签预训练(COLA)。我们深入分析了COLA在不同数据集和架构上的影响,并证明该方法能够带来显著的性能提升,尤其在微调任务仅能获得小规模数据集时效果更为突出。