3D object detection plays an important role in autonomous driving and other robotics applications. However, these detectors usually require training on large amounts of annotated data that is expensive and time-consuming to collect. Instead, we propose leveraging large amounts of unlabeled point cloud videos by semi-supervised learning of 3D object detectors via temporal graph neural networks. Our insight is that temporal smoothing can create more accurate detection results on unlabeled data, and these smoothed detections can then be used to retrain the detector. We learn to perform this temporal reasoning with a graph neural network, where edges represent the relationship between candidate detections in different time frames. After semi-supervised learning, our method achieves state-of-the-art detection performance on the challenging nuScenes and H3D benchmarks, compared to baselines trained on the same amount of labeled data. Project and code are released at https://www.jianrenw.com/SOD-TGNN/.
翻译:三维目标检测在自动驾驶及其他机器人应用中扮演着重要角色。然而,这些检测器通常需要大量标注数据进行训练,而标注数据的采集既昂贵又耗时。为此,我们提出通过半监督学习方式,利用时序图神经网络从大量无标注点云视频中学习三维目标检测器。我们的核心洞察在于:时序平滑处理能够在无标注数据上生成更准确的检测结果,而这些平滑后的检测结果可用于重新训练检测器。我们通过图神经网络来学习执行这种时序推理,其中图中的边代表不同时间帧中候选检测结果之间的关系。在半监督学习之后,与使用相同数量标注数据训练的基线方法相比,我们的方法在具有挑战性的nuScenes和H3D基准上取得了最先进的检测性能。项目与代码已发布在https://www.jianrenw.com/SOD-TGNN/。