Recently, self-training and active learning have been proposed to alleviate this problem. Self-training can improve model accuracy with massive unlabeled data, but some pseudo labels containing noise would be generated with limited or imbalanced training data. And there will be suboptimal models if human guidance is absent. Active learning can select more effective data to intervene, while the model accuracy can not be improved because the massive unlabeled data are not used. And the probability of querying sub-optimal samples will increase when the domain difference is too large, increasing annotation cost. This paper proposes an iterative loop learning method combining Self-Training and Active Learning (STAL) for domain adaptive semantic segmentation. The method first uses self-training to learn massive unlabeled data to improve model accuracy and provide more accurate selection models for active learning. Secondly, combined with the sample selection strategy of active learning, manual intervention is used to correct the self-training learning. Iterative loop to achieve the best performance with minimal label cost. Extensive experiments show that our method establishes state-of-the-art performance on tasks of GTAV to Cityscapes, SYNTHIA to Cityscapes, improving by 4.9% mIoU and 5.2% mIoU, compared to the previous best method, respectively. Code will be available.
翻译:最近,自训练和主动学习被提出以缓解这一问题。自训练可以利用大量无标签数据提升模型精度,但若训练数据有限或不平衡,会产生含有噪声的伪标签。若缺乏人工引导,模型将难以达到最优。主动学习能筛选更有效的数据进行干预,但由于未利用大量无标签数据,模型精度无法提升。当域差异过大时,查询次优样本的概率会增加,导致标注成本上升。本文提出了一种结合自训练与主动学习(STAL)的迭代循环学习方法,用于域自适应语义分割。该方法首先利用自训练学习大量无标签数据以提升模型精度,并为主动学习提供更准确的筛选模型;其次,结合主动学习的样本选择策略,通过人工干预修正自训练的学习过程,通过迭代循环以最小标注成本实现最优性能。大量实验表明,与先前最佳方法相比,我们的方法在GTAV到Cityscapes、SYNTHIA到Cityscapes的任务上分别提升了4.9% mIoU和5.2% mIoU,达到了最先进水平。代码将开源。