Weakly supervised 3D object detection aims to learn a 3D detector with lower annotation cost, e.g., 2D labels. Unlike prior work which still relies on few accurate 3D annotations, we propose a framework to study how to leverage constraints between 2D and 3D domains without requiring any 3D labels. Specifically, we employ visual data from three perspectives to establish connections between 2D and 3D domains. First, we design a feature-level constraint to align LiDAR and image features based on object-aware regions. Second, the output-level constraint is developed to enforce the overlap between 2D and projected 3D box estimations. Finally, the training-level constraint is utilized by producing accurate and consistent 3D pseudo-labels that align with the visual data. We conduct extensive experiments on the KITTI dataset to validate the effectiveness of the proposed three constraints. Without using any 3D labels, our method achieves favorable performance against state-of-the-art approaches and is competitive with the method that uses 500-frame 3D annotations. Code and models will be made publicly available at https://github.com/kuanchihhuang/VG-W3D.
翻译:弱监督三维目标检测旨在以更低的标注成本(例如2D标签)学习三维检测器。与仍依赖少量精确三维标注的先前工作不同,我们提出一种无需任何三维标签即可利用2D与3D域间约束的研究框架。具体而言,我们从三个视角利用视觉数据建立2D与3D域的关联:首先,设计特征级约束,基于目标感知区域对齐LiDAR与图像特征;其次,开发输出级约束,强制2D估计框与投影后的3D估计框重叠;最后,利用训练级约束生成与视觉数据对齐的精确一致三维伪标签。在KITTI数据集上的大量实验验证了所提三种约束的有效性。无需任何三维标签,我们的方法即取得优于当前最优方案的表现,且与使用500帧三维标注的方法性能相当。代码与模型将开源至 https://github.com/kuanchihhuang/VG-W3D。