Learning-based solutions for vision tasks require a large amount of labeled training data to ensure their performance and reliability. In single-task vision-based settings, inconsistency-based active learning has proven to be effective in selecting informative samples for annotation. However, there is a lack of research exploiting the inconsistency between multiple tasks in multi-task networks. To address this gap, we propose a novel multi-task active learning strategy for two coupled vision tasks: object detection and semantic segmentation. Our approach leverages the inconsistency between them to identify informative samples across both tasks. We propose three constraints that specify how the tasks are coupled and introduce a method for determining the pixels belonging to the object detected by a bounding box, to later quantify the constraints as inconsistency scores. To evaluate the effectiveness of our approach, we establish multiple baselines for multi-task active learning and introduce a new metric, mean Detection Segmentation Quality (mDSQ), tailored for the multi-task active learning comparison that addresses the performance of both tasks. We conduct extensive experiments on the nuImages and A9 datasets, demonstrating that our approach outperforms existing state-of-the-art methods by up to 3.4% mDSQ on nuImages. Our approach achieves 95% of the fully-trained performance using only 67% of the available data, corresponding to 20% fewer labels compared to random selection and 5% fewer labels compared to state-of-the-art selection strategy. Our code will be made publicly available after the review process.
翻译:基于学习的视觉任务解决方案需要大量标注训练数据以确保其性能和可靠性。在单任务视觉场景中,基于不一致性的主动学习已被证明能有效选取信息量丰富的样本进行标注。然而,针对多任务网络中任务间不一致性的研究仍存在空白。为填补这一缺口,我们提出了一种面向两个耦合视觉任务(目标检测与语义分割)的新型多任务主动学习策略。该方法利用任务间的不一致性,识别对两个任务均有信息量的样本。我们提出了三种约束条件来规范任务的耦合方式,并引入一种确定边界框内目标像素归属的方法,进而将约束条件量化为不一致性分数。为评估方法的有效性,我们建立了多任务主动学习的多个基线,并提出专用于多任务主动学习对比的新指标——平均检测分割质量(mDSQ),该指标兼顾了两个任务的性能。我们在nuImages和A9数据集上开展了大量实验,结果表明,我们的方法在nuImages上相较现有最优方法实现了最高3.4%的mDSQ提升。仅使用67%的可用数据,我们的方法即可达到完全训练性能的95%,相较于随机选择策略减少20%的标注量,相较于最优选择策略减少5%的标注量。代码将在审稿流程结束后公开。