Current state-of-the-art (SOTA) 3D object detection methods often require a large amount of 3D bounding box annotations for training. However, collecting such large-scale densely-supervised datasets is notoriously costly. To reduce the cumbersome data annotation process, we propose a novel sparsely-annotated framework, in which we just annotate one 3D object per scene. Such a sparse annotation strategy could significantly reduce the heavy annotation burden, while inexact and incomplete sparse supervision may severely deteriorate the detection performance. To address this issue, we develop the SS3D++ method that alternatively improves 3D detector training and confident fully-annotated scene generation in a unified learning scheme. Using sparse annotations as seeds, we progressively generate confident fully-annotated scenes based on designing a missing-annotated instance mining module and reliable background mining module. Our proposed method produces competitive results when compared with SOTA weakly-supervised methods using the same or even more annotation costs. Besides, compared with SOTA fully-supervised methods, we achieve on-par or even better performance on the KITTI dataset with about 5x less annotation cost, and 90% of their performance on the Waymo dataset with about 15x less annotation cost. The additional unlabeled training scenes could further boost the performance.
翻译:当前最先进的三维目标检测方法通常需要大量三维边界框标注进行训练。然而,收集此类大规模密集监督数据集的成本极高。为减少繁琐的数据标注过程,我们提出了一种新颖的稀疏标注框架,其中每个场景仅标注一个三维物体。这种稀疏标注策略能显著减轻繁重的标注负担,但不精确且不完整的稀疏监督可能严重降低检测性能。为解决此问题,我们开发了SS3D++方法,该方法在统一学习方案中交替改进三维检测器训练与生成高置信度的完整标注场景。以稀疏标注作为种子,我们通过设计缺失标注实例挖掘模块和可靠背景挖掘模块,逐步生成高置信度的完整标注场景。与使用相同甚至更高标注成本的最先进弱监督方法相比,我们提出的方法取得了具有竞争力的结果。此外,与最先进的完全监督方法相比,我们在KITTI数据集上以约5倍的标注成本实现了相当甚至更优的性能,在Waymo数据集上以约15倍的标注成本达到了其90%的性能。额外的未标注训练场景可进一步提升性能。