Recently, self-training and active learning have been proposed to alleviate this problem. Self-training can improve model accuracy with massive unlabeled data, but some pseudo labels containing noise would be generated with limited or imbalanced training data. And there will be suboptimal models if human guidance is absent. Active learning can select more effective data to intervene, while the model accuracy can not be improved because the massive unlabeled data are not used. And the probability of querying sub-optimal samples will increase when the domain difference is too large, increasing annotation cost. This paper proposes an iterative loop learning method combining Self-Training and Active Learning (STAL) for domain adaptive semantic segmentation. The method first uses self-training to learn massive unlabeled data to improve model accuracy and provide more accurate selection models for active learning. Secondly, combined with the sample selection strategy of active learning, manual intervention is used to correct the self-training learning. Iterative loop to achieve the best performance with minimal label cost. Extensive experiments show that our method establishes state-of-the-art performance on tasks of GTAV to Cityscapes, SYNTHIA to Cityscapes, improving by 4.9% mIoU and 5.2% mIoU, compared to the previous best method, respectively. The code is available at https://github.com/licongguan/STAL.
翻译:近期,自训练和主动学习被提出以缓解这一问题。自训练可利用大量无标签数据提升模型精度,但在训练数据有限或不均衡时,会生成含有噪声的伪标签;若缺乏人工引导,模型性能可能陷入次优。主动学习虽能选择更有效的数据进行干预,但由于未充分利用海量无标签数据,模型精度无法提升;且当域差异过大时,查询次优样本的概率会增加,从而增加标注成本。本文提出了一种结合自训练与主动学习的迭代循环学习方法(STAL),用于域自适应语义分割。该方法首先利用自训练学习大量无标签数据以提升模型精度,并为主动学习提供更精确的选择模型;其次,结合主动学习的样本选择策略,通过人工干预修正自训练过程;通过迭代循环,以最小标签成本实现最优性能。大量实验表明,本方法在GTAV至Cityscapes、SYNTHIA至Cityscapes任务上均达到最优性能,相比此前最佳方法分别提升了4.9% mIoU和5.2% mIoU。代码已开源在https://github.com/licongguan/STAL。