The curation of large-scale datasets is still costly and requires much time and resources. Data is often manually labeled, and the challenge of creating high-quality datasets remains. In this work, we fill the research gap using active learning for multi-modal 3D object detection. We propose ActiveAnno3D, an active learning framework to select data samples for labeling that are of maximum informativeness for training. We explore various continuous training methods and integrate the most efficient method regarding computational demand and detection performance. Furthermore, we perform extensive experiments and ablation studies with BEVFusion and PV-RCNN on the nuScenes and TUM Traffic Intersection dataset. We show that we can achieve almost the same performance with PV-RCNN and the entropy-based query strategy when using only half of the training data (77.25 mAP compared to 83.50 mAP) of the TUM Traffic Intersection dataset. BEVFusion achieved an mAP of 64.31 when using half of the training data and 75.0 mAP when using the complete nuScenes dataset. We integrate our active learning framework into the proAnno labeling tool to enable AI-assisted data selection and labeling and minimize the labeling costs. Finally, we provide code, weights, and visualization results on our website: https://active3d-framework.github.io/active3d-framework.
翻译:大规模数据集的构建成本依然高昂,且需要耗费大量时间与资源。数据通常依赖人工标注,如何创建高质量数据集仍是挑战。本文利用主动学习方法填补多模态三维目标检测领域的研究空白。我们提出ActiveAnno3D——一种用于筛选高信息密度训练样本的主动学习框架。通过探索多种连续训练方法,我们整合了计算效率与检测性能最优的方案。此外,我们在nuScenes与TUM交通路口数据集上,结合BEVFusion与PV-RCNN开展大量实验与消融研究。实验表明,在TUM交通路口数据集中,使用仅一半训练数据时(平均精度77.25%对比83.50%),PV-RCNN结合基于熵的查询策略即可取得相近性能;而BEVFusion在利用半数训练数据时获得64.31%平均精度,使用完整nuScenes数据集时达到75.0%。我们将主动学习框架集成至proAnno标注工具中,实现AI辅助的数据筛选与标注,降低标注成本。最后,我们在网站提供了代码、权重及可视化结果:https://active3d-framework.github.io/active3d-framework。