Despite the availability of large datasets for tasks like image classification and image-text alignment, labeled data for more complex recognition tasks, such as detection and segmentation, is less abundant. In particular, for instance segmentation annotations are time-consuming to produce, and the distribution of instances is often highly skewed across classes. While semi-supervised teacher-student distillation methods show promise in leveraging vast amounts of unlabeled data, they suffer from miscalibration, resulting in overconfidence in frequently represented classes and underconfidence in rarer ones. Additionally, these methods encounter difficulties in efficiently learning from a limited set of examples. We introduce a dual-strategy to enhance the teacher model's training process, substantially improving the performance on few-shot learning. Secondly, we propose a calibration correction mechanism that that enables the student model to correct the teacher's calibration errors. Using our approach, we observed marked improvements over a state-of-the-art supervised baseline performance on the LVIS dataset, with an increase of 2.8% in average precision (AP) and 10.3% gain in AP for rare classes.
翻译:尽管图像分类和图文对齐等任务已拥有大规模数据集,但检测与分割等更复杂识别任务的标注数据仍较匮乏。尤其是实例分割标注耗时费力,且不同类别的实例分布常呈现高度偏斜。虽然半监督师生蒸馏方法在利用海量无标签数据方面展现出潜力,但其存在校准误差问题——对高频类别过度自信,而对稀有类别信心不足。此外,这类方法难以从有限样本中高效学习。我们提出双重策略:其一优化教师模型的训练过程,显著提升少样本学习性能;其二引入校准校正机制,使学生模型能够修正教师模型的校准误差。采用本方法后,我们在LVIS数据集上相较于顶尖有监督基线取得显著提升,平均精度(AP)提升2.8%,稀有类别AP提升10.3%。