With the rapid development of large models, the need for data has become increasingly crucial. Especially in 3D object detection, costly manual annotations have hindered further advancements. To reduce the burden of annotation, we study the problem of achieving 3D object detection solely based on 2D annotations. Thanks to advanced 3D reconstruction techniques, it is now feasible to reconstruct the overall static 3D scene. However, extracting precise object-level annotations from the entire scene and generalizing these limited annotations to the entire scene remain challenges. In this paper, we introduce a novel paradigm called BA$^2$-Det, encompassing pseudo label generation and multi-stage generalization. We devise the DoubleClustering algorithm to obtain object clusters from reconstructed scene-level points, and further enhance the model's detection capabilities by developing three stages of generalization: progressing from complete to partial, static to dynamic, and close to distant. Experiments conducted on the large-scale Waymo Open Dataset show that the performance of BA$^2$-Det is on par with the fully-supervised methods using 10% annotations. Additionally, using large raw videos for pretraining,BA$^2$-Det can achieve a 20% relative improvement on the KITTI dataset. The method also has great potential for detecting open-set 3D objects in complex scenes. Project page: https://ba2det.site.
翻译:随着大模型的快速发展,对数据的需求日益迫切。尤其在三维目标检测领域,昂贵的人工标注阻碍了其进一步发展。为减轻标注负担,我们研究了仅基于二维标注实现三维目标检测的问题。得益于先进的三维重建技术,重建整体静态三维场景已成为可能。然而,从整个场景中提取精确的物体级标注,并将这些有限标注泛化至全场景仍面临挑战。本文提出了一种名为BA$^2$-Det的新范式,涵盖伪标签生成与多阶段泛化。我们设计了DoubleClustering算法从重建的场景级点云中获取物体聚类,并通过开发三个阶段:从完整到局部、从静态到动态、从近到远的渐进式泛化,进一步增强模型检测能力。在大型Waymo开放数据集上的实验表明,BA$^2$-Det的性能与使用10%标注的全监督方法相当。此外,利用大规模原始视频进行预训练,BA$^2$-Det在KITTI数据集上可实现20%的相对性能提升。该方法在复杂场景中检测开放集三维物体方面同样具有巨大潜力。项目页面:https://ba2det.site。