3DIoUMatch: Leveraging IoU Prediction for Semi-Supervised 3D Object Detection

3D object detection is an important yet demanding task that heavily relies on difficult to obtain 3D annotations. To reduce the required amount of supervision, we propose 3DIoUMatch, a novel semi-supervised method for 3D object detection applicable to both indoor and outdoor scenes. We leverage a teacher-student mutual learning framework to propagate information from the labeled to the unlabeled train set in the form of pseudo-labels. However, due to the high task complexity, we observe that the pseudo-labels suffer from significant noise and are thus not directly usable. To that end, we introduce a confidence-based filtering mechanism, inspired by FixMatch. We set confidence thresholds based upon the predicted objectness and class probability to filter low-quality pseudo-labels. While effective, we observe that these two measures do not sufficiently capture localization quality. We therefore propose to use the estimated 3D IoU as a localization metric and set category-aware self-adjusted thresholds to filter poorly localized proposals. We adopt VoteNet as our backbone detector on indoor datasets while we use PV-RCNN on the autonomous driving dataset, KITTI. Our method consistently improves state-of-the-art methods on both ScanNet and SUN-RGBD benchmarks by significant margins under all label ratios (including fully labeled setting). For example, when training using only 10\% labeled data on ScanNet, 3DIoUMatch achieves 7.7 absolute improvement on [email protected] and 8.5 absolute improvement on [email protected] upon the prior art. On KITTI, we are the first to demonstrate semi-supervised 3D object detection and our method surpasses a fully supervised baseline from 1.8% to 7.6% under different label ratios and categories.

翻译：3D 对象探测是一项重要而又艰巨的任务, 严重依赖难以获得 3D 注释。为了减少所需的监管量, 我们提议 3DIOUMatch 3D 目标检测的新型半监督性方法, 适用于室内和室外场景。我们利用教师- 学生相互学习框架将标签上的信息传播到假标签形式的无标签列列列列中。然而, 由于任务的复杂性, 我们观察到伪标签受到重大噪音的影响, 因此无法直接使用 3D 说明。为此, 我们引入了一个基于信任的绝对过滤机制, 由 FixMatch 启发。我们根据预测对象和阶级概率概率来过滤低质量的假标签, 我们设定了信任阈值阈值。虽然我们观察到这两项措施并不能充分捕捉本地化质量。我们因此建议使用估计的 3D IOU 作为本地化指标, 并设定自觉调整阈值的阈值阈值阈值阈值, 我们首先在室内数据集上采用 PV- RCN 绝对的精确性过滤器, 在自主驱动数据定位上, 包括SAR0 IM IM IM 标前标。我们不断改进方法, 在标标 10B 上, 标下 10B 标上, 标全面校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校校