3D object detection plays an important role in autonomous driving and other robotics applications. However, these detectors usually require training on large amounts of annotated data that is expensive and time-consuming to collect. Instead, we propose leveraging large amounts of unlabeled point cloud videos by semi-supervised learning of 3D object detectors via temporal graph neural networks. Our insight is that temporal smoothing can create more accurate detection results on unlabeled data, and these smoothed detections can then be used to retrain the detector. We learn to perform this temporal reasoning with a graph neural network, where edges represent the relationship between candidate detections in different time frames. After semi-supervised learning, our method achieves state-of-the-art detection performance on the challenging nuScenes and H3D benchmarks, compared to baselines trained on the same amount of labeled data. Project and code are released at https://www.jianrenw.com/SOD-TGNN/.
翻译:三维物体检测在自动驾驶及其他机器人应用中发挥着重要作用。然而,这些检测器通常需要大量标注数据进行训练,而标注数据的收集既昂贵又耗时。为此,我们提出利用大量未标注点云视频,通过时序图神经网络实现三维物体检测器的半监督学习。我们的核心思路是:时序平滑能在未标注数据上生成更准确的检测结果,这些平滑后的检测结果可用于重新训练检测器。我们通过学习一个图神经网络来完成这种时序推理,其中图边表示不同时间帧中候选检测结果之间的关联。经过半监督学习后,与使用相同数量标注数据训练的基线模型相比,我们的方法在具有挑战性的nuScenes和H3D基准上取得了最优的检测性能。项目主页和代码已发布在 https://www.jianrenw.com/SOD-TGNN/。