InPL: Pseudo-labeling the Inliers First for Imbalanced Semi-supervised Learning

Recent state-of-the-art methods in imbalanced semi-supervised learning (SSL) rely on confidence-based pseudo-labeling with consistency regularization. To obtain high-quality pseudo-labels, a high confidence threshold is typically adopted. However, it has been shown that softmax-based confidence scores in deep networks can be arbitrarily high for samples far from the training data, and thus, the pseudo-labels for even high-confidence unlabeled samples may still be unreliable. In this work, we present a new perspective of pseudo-labeling for imbalanced SSL. Without relying on model confidence, we propose to measure whether an unlabeled sample is likely to be ``in-distribution''; i.e., close to the current training data. To decide whether an unlabeled sample is ``in-distribution'' or ``out-of-distribution'', we adopt the energy score from out-of-distribution detection literature. As training progresses and more unlabeled samples become in-distribution and contribute to training, the combined labeled and pseudo-labeled data can better approximate the true class distribution to improve the model. Experiments demonstrate that our energy-based pseudo-labeling method, \textbf{InPL}, albeit conceptually simple, significantly outperforms confidence-based methods on imbalanced SSL benchmarks. For example, it produces around 3\% absolute accuracy improvement on CIFAR10-LT. When combined with state-of-the-art long-tailed SSL methods, further improvements are attained. In particular, in one of the most challenging scenarios, InPL achieves a 6.9\% accuracy improvement over the best competitor.

翻译：摘要：当前最先进的不平衡半监督学习方法通常依赖于基于置信度的伪标签与一致性正则化。为获得高质量伪标签，通常采用高置信度阈值。然而研究表明，深度网络中基于softmax的置信度评分对远离训练数据的样本可能任意偏高，因此即便是高置信度未标记样本的伪标签仍可能不可靠。本文提出了一种面向不平衡半监督学习的伪标签新视角。我们不再依赖模型置信度，而是通过度量未标记样本是否属于“分布内”（即接近当前训练数据）来判定其伪标签可靠性。为此，我们采用来自分布外检测文献的能量分数来判定未标记样本属于“分布内”还是“分布外”。随着训练推进，更多未标记样本转为分布内状态并参与训练，联合标记数据与伪标记数据能更准确逼近真实类别分布，从而提升模型性能。实验表明，我们提出的基于能量的伪标签方法InPL虽然概念简单，但在不平衡半监督学习基准测试中显著优于基于置信度的方法。例如，在CIFAR10-LT数据集上实现了约3%的绝对准确率提升。与最先进的长尾半监督学习方法结合时，可取得进一步改善。特别在最具挑战性的场景之一中，InPL相比最优竞品实现了6.9%的准确率提升。