Semi-Supervised Learning (SSL) aims to learn a model using a tiny labeled set and massive amounts of unlabeled data. To better exploit the unlabeled data the latest SSL methods use pseudo-labels predicted from a single discriminative classifier. However, the generated pseudo-labels are inevitably linked to inherent confirmation bias and noise which greatly affects the model performance. In this work we introduce a new framework for SSL named NorMatch. Firstly, we introduce a new uncertainty estimation scheme based on normalizing flows, as an auxiliary classifier, to enforce highly certain pseudo-labels yielding a boost of the discriminative classifiers. Secondly, we introduce a threshold-free sample weighting strategy to exploit better both high and low confidence pseudo-labels. Furthermore, we utilize normalizing flows to model, in an unsupervised fashion, the distribution of unlabeled data. This modelling assumption can further improve the performance of generative classifiers via unlabeled data, and thus, implicitly contributing to training a better discriminative classifier. We demonstrate, through numerical and visual results, that NorMatch achieves state-of-the-art performance on several datasets.
翻译:半监督学习旨在利用少量有标签数据和大量无标签数据训练模型。为充分利用无标签数据,现有半监督学习方法多采用单个判别式分类器预测伪标签。然而,生成的伪标签不可避免地存在固有确认偏差和噪声,严重影响模型性能。本文提出了一种名为NorMatch的半监督学习新框架。首先,我们引入一种基于归一化流的不确定性估计方案作为辅助分类器,通过强制生成高置信度伪标签来增强判别式分类器性能。其次,我们提出一种无阈值的样本加权策略,以更好地利用高置信度和低置信度伪标签。此外,我们采用归一化流以无监督方式对无标签数据分布进行建模。这一建模假设可通过无标签数据进一步提升生成式分类器的性能,从而间接促进判别式分类器的训练。数值与可视化结果表明,NorMatch在多个数据集上均取得了最优性能。