Weakly supervised 3D object detection aims to learn a 3D detector with lower annotation cost, e.g., 2D labels. Unlike prior work which still relies on few accurate 3D annotations, we propose a framework to study how to leverage constraints between 2D and 3D domains without requiring any 3D labels. Specifically, we employ visual data from three perspectives to establish connections between 2D and 3D domains. First, we design a feature-level constraint to align LiDAR and image features based on object-aware regions. Second, the output-level constraint is developed to enforce the overlap between 2D and projected 3D box estimations. Finally, the training-level constraint is utilized by producing accurate and consistent 3D pseudo-labels that align with the visual data. We conduct extensive experiments on the KITTI dataset to validate the effectiveness of the proposed three constraints. Without using any 3D labels, our method achieves favorable performance against state-of-the-art approaches and is competitive with the method that uses 500-frame 3D annotations. Code and models will be made publicly available at https://github.com/kuanchihhuang/VG-W3D.
翻译:弱监督三维目标检测旨在利用较低标注成本(如二维标签)训练三维检测器。与仍需少量精确三维标注的现有工作不同,本文提出一种无需任何三维标签即可利用二维与三维域间约束的框架。具体而言,我们从三个视角引入视觉数据以建立二维与三维域的联系:首先,设计基于目标感知区域的特征级约束对齐激光雷达与图像特征;其次,构建输出级约束强制二维框与投影三维框的重叠;最后,利用训练级约束生成与视觉数据一致的精确三维伪标签。在KITTI数据集上的大量实验验证了所提三种约束的有效性。无需使用任何三维标签,本方法即达到与现有最优方法相当的性能,且可与使用500帧三维标注的方法相媲美。代码与模型将开源至https://github.com/kuanchihhuang/VG-W3D。