Current LiDAR-based 3D object detectors for autonomous driving are almost entirely trained on human-annotated data collected in specific geographical domains with specific sensor setups, making it difficult to adapt to a different domain. MODEST is the first work to train 3D object detectors without any labels. Our work, HyperMODEST, proposes a universal method implemented on top of MODEST that can largely accelerate the self-training process and does not require tuning on a specific dataset. We filter intermediate pseudo-labels used for data augmentation with low confidence scores. On the nuScenes dataset, we observe a significant improvement of 1.6% in AP BEV in 0-80m range at IoU=0.25 and an improvement of 1.7% in AP BEV in 0-80m range at IoU=0.5 while only using one-fifth of the training time in the original approach by MODEST. On the Lyft dataset, we also observe an improvement over the baseline during the first round of iterative self-training. We explore the trade-off between high precision and high recall in the early stage of the self-training process by comparing our proposed method with two other score filtering methods: confidence score filtering for pseudo-labels with and without static label retention. The code and models of this work are available at https://github.com/TRAILab/HyperMODEST
翻译:当前用于自动驾驶的基于激光雷达的三维目标检测器几乎完全依赖于在特定地理区域和特定传感器配置下采集的人工标注数据,导致难以适应不同领域。MODEST是首个无需任何标签即可训练三维目标检测器的工作。我们的工作HyperMODEST提出了一种基于MODEST的通用方法,能够大幅加速自训练过程,且无需针对特定数据集进行调参。我们对数据增强中使用的中间伪标签进行低置信度分数滤波。在nuScenes数据集上,我们观察到在IoU=0.25时0-80米范围内的AP BEV提升了1.6%,在IoU=0.5时该指标提升了1.7%,而训练时间仅为MODEST原始方法的五分之一。在Lyft数据集上,我们在第一轮迭代自训练中也观察到相较于基线的性能提升。通过将我们提出的方法与另外两种分数滤波方法(含静态标签保留和不含静态标签保留的伪标签置信度分数滤波)进行比较,我们探索了自训练早期阶段高精度与高召回率之间的权衡。本文的代码和模型已开源至https://github.com/TRAILab/HyperMODEST