3D object detection plays a crucial role in various applications such as autonomous vehicles, robotics and augmented reality. However, training 3D detectors requires a costly precise annotation, which is a hindrance to scaling annotation to large datasets. To address this challenge, we propose a weakly supervised 3D annotator that relies solely on 2D bounding box annotations from images, along with size priors. One major problem is that supervising a 3D detection model using only 2D boxes is not reliable due to ambiguities between different 3D poses and their identical 2D projection. We introduce a simple yet effective and generic solution: we build 3D proxy objects with annotations by construction and add them to the training dataset. Our method requires only size priors to adapt to new classes. To better align 2D supervision with 3D detection, our method ensures depth invariance with a novel expression of the 2D losses. Finally, to detect more challenging instances, our annotator follows an offline pseudo-labelling scheme which gradually improves its 3D pseudo-labels. Extensive experiments on the KITTI dataset demonstrate that our method not only performs on-par or above previous works on the Car category, but also achieves performance close to fully supervised methods on more challenging classes. We further demonstrate the effectiveness and robustness of our method by being the first to experiment on the more challenging nuScenes dataset. We additionally propose a setting where weak labels are obtained from a 2D detector pre-trained on MS-COCO instead of human annotations.
翻译:三维目标检测在自动驾驶、机器人和增强现实等众多应用中发挥着关键作用。然而,训练三维检测器需要成本高昂的精确标注,这阻碍了大规模数据集的标注扩展。为应对这一挑战,我们提出了一种弱监督三维标注器,该标注器仅依赖图像中的二维边界框标注以及尺寸先验信息。一个主要问题是,仅使用二维框监督三维检测模型并不可靠,因为不同的三维姿态可能产生完全相同的二维投影,导致歧义性。我们引入了一种简单而有效的通用解决方案:通过构造方式生成带标注的三维代理对象,并将其加入训练数据集。我们的方法仅需尺寸先验即可适应新类别。为更好地将二维监督与三维检测对齐,本方法通过一种新颖的二维损失表达式确保深度不变性。最后,为检测更具挑战性的实例,我们的标注器采用离线伪标注方案,逐步优化其三维伪标签。在KITTI数据集上的大量实验表明,我们的方法不仅在汽车类别上达到或超越了先前工作的性能,在更具挑战性的类别上也取得了接近全监督方法的检测效果。我们进一步通过在更具挑战性的nuScenes数据集上进行首次实验,验证了本方法的有效性和鲁棒性。此外,我们还提出了一种替代设置:弱标签并非来自人工标注,而是通过预训练于MS-COCO的二维检测器获得。