Object detection (OD), a crucial vision task, remains challenged by the lack of large training datasets with precise object localization labels. In this work, we propose ALWOD, a new framework that addresses this problem by fusing active learning (AL) with weakly and semi-supervised object detection paradigms. Because the performance of AL critically depends on the model initialization, we propose a new auxiliary image generator strategy that utilizes an extremely small labeled set, coupled with a large weakly tagged set of images, as a warm-start for AL. We then propose a new AL acquisition function, another critical factor in AL success, that leverages the student-teacher OD pair disagreement and uncertainty to effectively propose the most informative images to annotate. Finally, to complete the AL loop, we introduce a new labeling task delegated to human annotators, based on selection and correction of model-proposed detections, which is both rapid and effective in labeling the informative images. We demonstrate, across several challenging benchmarks, that ALWOD significantly narrows the gap between the ODs trained on few partially labeled but strategically selected image instances and those that rely on the fully-labeled data. Our code is publicly available on https://github.com/seqam-lab/ALWOD.
翻译:目标检测作为计算机视觉中的关键任务,一直因缺乏带有精确定位标注的大规模训练数据集而面临挑战。本文提出ALWOD——一种融合主动学习与弱监督/半监督目标检测范式的新框架。针对主动学习性能严重依赖模型初始化的特性,我们提出一种新的辅助图像生成器策略,通过极小规模的标注图像集结合大规模弱标签图像集实现主动学习的冷启动优化。随后,我们创新性地提出主动学习采集函数——这一决定主动学习成效的关键要素,通过利用师生目标检测模型对的预测分歧与不确定性,高效筛选出最具信息价值的待标注图像。最后为完善主动学习闭环,我们设计了基于模型检测结果筛选与修正的新型标注任务,该任务既能高效完成信息图像标注,又便于人类标注者快速执行。在多个具有挑战性的基准测试中,ALWOD显著缩小了少样本策略性标注图像训练的目标检测模型与全监督目标检测模型之间的性能差距。我们的代码已在https://github.com/seqam-lab/ALWOD 开源。