Current state-of-the-art (SOTA) 3D object detection methods often require a large amount of 3D bounding box annotations for training. However, collecting such large-scale densely-supervised datasets is notoriously costly. To reduce the cumbersome data annotation process, we propose a novel sparsely-annotated framework, in which we just annotate one 3D object per scene. Such a sparse annotation strategy could significantly reduce the heavy annotation burden, while inexact and incomplete sparse supervision may severely deteriorate the detection performance. To address this issue, we develop the SS3D++ method that alternatively improves 3D detector training and confident fully-annotated scene generation in a unified learning scheme. Using sparse annotations as seeds, we progressively generate confident fully-annotated scenes based on designing a missing-annotated instance mining module and reliable background mining module. Our proposed method produces competitive results when compared with SOTA weakly-supervised methods using the same or even more annotation costs. Besides, compared with SOTA fully-supervised methods, we achieve on-par or even better performance on the KITTI dataset with about 5x less annotation cost, and 90% of their performance on the Waymo dataset with about 15x less annotation cost. The additional unlabeled training scenes could further boost the performance. The code will be available at https://github.com/gaocq/SS3D2.
翻译:当前最先进的三维物体检测方法通常需要大量三维边界框标注进行训练。然而,收集此类大规模密集监督数据集成本极高。为减少繁琐的数据标注过程,我们提出了一种新颖的稀疏标注框架,其中每场景仅标注一个三维物体。此类稀疏标注策略可显著降低繁重的标注负担,但非精确且不完整的稀疏监督可能严重损害检测性能。针对这一问题,我们开发了SS3D++方法,通过统一学习方案交替优化三维检测器训练与置信的全标注场景生成。以稀疏标注为种子,我们基于设计的缺失标注实例挖掘模块和可靠背景挖掘模块逐步生成置信的全标注场景。与使用相同甚至更高标注成本的最先进弱监督方法相比,所提方法取得了具有竞争力的结果。此外,与最先进的全监督方法相比,我们在KITTI数据集上以约5倍更少的标注成本实现了相当甚至更优的性能,在Waymo数据集上以约15倍更少的标注成本达到其90%的性能。额外的未标注训练场景可进一步提升性能。代码将在https://github.com/gaocq/SS3D2开源。