Consistency regularization and pseudo-labeling have significantly advanced semi-supervised learning (SSL). Prior works have effectively employed Mixup for consistency regularization in SSL. However, our findings indicate that applying Mixup for consistency regularization may degrade SSL performance by compromising the purity of artificial labels. Moreover, most pseudo-labeling based methods utilize thresholding strategy to exclude low-confidence data, aiming to mitigate confirmation bias; however, this approach limits the utility of unlabeled samples. To address these challenges, we propose RegMixMatch, a novel framework that optimizes the use of Mixup with both high- and low-confidence samples in SSL. First, we introduce semi-supervised RegMixup, which effectively addresses reduced artificial labels purity by using both mixed samples and clean samples for training. Second, we develop a class-aware Mixup technique that integrates information from the top-2 predicted classes into low-confidence samples and their artificial labels, reducing the confirmation bias associated with these samples and enhancing their effective utilization. Experimental results demonstrate that RegMixMatch achieves state-of-the-art performance across various SSL benchmarks.
翻译:一致性正则化和伪标记技术显著推动了半监督学习(SSL)的发展。先前的研究已成功将Mixup应用于SSL中的一致性正则化。然而,我们的研究发现,将Mixup用于一致性正则化可能会损害人工标签的纯度,从而降低SSL性能。此外,大多数基于伪标记的方法采用阈值策略排除低置信度数据,旨在缓解确认偏差;但这种方法限制了未标记样本的利用率。为应对这些挑战,我们提出了RegMixMatch,这是一个新颖的框架,旨在优化SSL中高置信度和低置信度样本对Mixup的利用。首先,我们引入了半监督RegMixup,通过同时使用混合样本和干净样本进行训练,有效解决了人工标签纯度下降的问题。其次,我们开发了一种类感知Mixup技术,将预测概率最高的两个类别的信息整合到低置信度样本及其人工标签中,从而减少与这些样本相关的确认偏差,并提升其有效利用率。实验结果表明,RegMixMatch在各种SSL基准测试中均取得了最先进的性能。