In semi-supervised learning, unlabeled samples can be utilized through augmentation and consistency regularization. However, we observed certain samples, even undergoing strong augmentation, are still correctly classified with high confidence, resulting in a loss close to zero. It indicates that these samples have been already learned well and do not provide any additional optimization benefits to the model. We refer to these samples as ``naive samples". Unfortunately, existing SSL models overlook the characteristics of naive samples, and they just apply the same learning strategy to all samples. To further optimize the SSL model, we emphasize the importance of giving attention to naive samples and augmenting them in a more diverse manner. Sample adaptive augmentation (SAA) is proposed for this stated purpose and consists of two modules: 1) sample selection module; 2) sample augmentation module. Specifically, the sample selection module picks out {naive samples} based on historical training information at each epoch, then the naive samples will be augmented in a more diverse manner in the sample augmentation module. Thanks to the extreme ease of implementation of the above modules, SAA is advantageous for being simple and lightweight. We add SAA on top of FixMatch and FlexMatch respectively, and experiments demonstrate SAA can significantly improve the models. For example, SAA helped improve the accuracy of FixMatch from 92.50% to 94.76% and that of FlexMatch from 95.01% to 95.31% on CIFAR-10 with 40 labels.
翻译:在半监督学习中,未标注样本可通过数据增强和一致性正则化来利用。然而,我们观察到某些样本即使经过强增强,仍能以高置信度被正确分类,导致损失趋近于零。这表明这些样本已被充分学习,无法为模型提供额外的优化收益。我们将这类样本称为“朴素样本”。遗憾的是,现有半监督学习模型忽略了朴素样本的特性,仅对所有样本采用相同的学习策略。为进一步优化半监督学习模型,我们强调关注朴素样本并以更多样化方式对其进行增强的重要性。为此,我们提出样本自适应增强方法,该方法包含两个模块:1)样本选择模块;2)样本增强模块。具体而言,样本选择模块在每个训练轮次基于历史训练信息筛选出朴素样本,随后在样本增强模块中以更多样化的方式对这些朴素样本进行增强。得益于上述模块的极端易实现性,样本自适应增强具有简洁轻量的优势。我们分别在FixMatch和FlexMatch模型上叠加样本自适应增强,实验表明该方法能显著提升模型性能。例如,在CIFAR-10数据集使用40个标注样本时,样本自适应增强将FixMatch的准确率从92.50%提升至94.76%,将FlexMatch的准确率从95.01%提升至95.31%。