Dense prediction tasks such as object detection and segmentation require high-quality labels at pixel level, which are costly to obtain. Recent advances in foundation models have enabled the generation of autolabels, which we find to be competitive but not yet sufficient to fully replace human annotations, especially for more complex datasets. Thus, we consider the challenge of selecting a representative subset of images for labeling from a large pool of unlabeled images under a constrained annotation budget. This task is further complicated by imbalanced class distributions, as rare classes are often underrepresented in selected subsets. We propose object-focused data selection (OFDS) which leverages object-level representations to ensure that the selected image subsets semantically cover the target classes, including rare ones. We validate OFDS on PASCAL VOC and Cityscapes for object detection and semantic segmentation tasks. Our experiments demonstrate that prior methods which employ image-level representations fail to consistently outperform random selection. In contrast, OFDS consistently achieves state-of-the-art performance with substantial improvements over all baselines in scenarios with imbalanced class distributions. Moreover, we demonstrate that pre-training with autolabels on the full datasets before fine-tuning on human-labeled subsets selected by OFDS further enhances the final performance.
翻译:目标检测与分割等密集预测任务需要像素级的高质量标注,其获取成本高昂。基础模型的最新进展使得自动标注生成成为可能,我们发现这些自动标注虽具有竞争力,但尚不足以完全替代人工标注,尤其在处理更复杂的数据集时。因此,我们研究在有限标注预算下,从大规模未标注图像池中选择代表性图像子集进行标注的挑战。该任务因类别分布不平衡而进一步复杂化,因为稀有类别在所选子集中往往代表性不足。我们提出物体聚焦数据选择方法,该方法利用物体级表征来确保所选图像子集在语义上覆盖目标类别,包括稀有类别。我们在PASCAL VOC和Cityscapes数据集上针对目标检测与语义分割任务验证了OFDS的有效性。实验表明,先前采用图像级表征的方法无法持续超越随机选择。相比之下,在类别分布不平衡的场景中,OFDS始终取得最先进的性能,较所有基线方法均有显著提升。此外,我们证明在完整数据集上使用自动标注进行预训练,再对OFDS选择的人工标注子集进行微调,能进一步提升最终性能。