Existing semi-supervised learning algorithms adopt pseudo-labeling and consistency regulation techniques to introduce supervision signals for unlabeled samples. To overcome the inherent limitation of threshold-based pseudo-labeling, prior studies have attempted to align the confidence threshold with the evolving learning status of the model, which is estimated through the predictions made on the unlabeled data. In this paper, we further reveal that classifier weights can reflect the differentiated learning status across categories and consequently propose a class-specific adaptive threshold mechanism. Additionally, considering that even the optimal threshold scheme cannot resolve the problem of discarding unlabeled samples, a binary classification consistency regulation approach is designed to distinguish candidate classes from negative options for all unlabeled samples. By combining the above strategies, we present a novel SSL algorithm named AllMatch, which achieves improved pseudo-label accuracy and a 100% utilization ratio for the unlabeled data. We extensively evaluate our approach on multiple benchmarks, encompassing both balanced and imbalanced settings. The results demonstrate that AllMatch consistently outperforms existing state-of-the-art methods.
翻译:现有的半监督学习算法采用伪标记和一致性正则化技术为未标记样本引入监督信号。为克服基于阈值的伪标记方法的内在局限性,先前研究尝试将置信度阈值与模型动态学习状态对齐,该状态通过未标记数据的预测结果进行估计。本文进一步揭示分类器权重能够反映不同类别间的差异化学习状态,据此提出一种类别自适应的阈值机制。此外,考虑到即使最优阈值方案也无法解决丢弃未标记样本的问题,我们设计了一种二分类一致性正则化方法,为所有未标记样本区分候选类别与负类选项。通过整合上述策略,我们提出名为AllMatch的新型半监督学习算法,该算法实现了更高的伪标记准确率及100%的未标记数据利用率。我们在包含均衡与非均衡场景的多个基准数据集上进行了广泛评估,结果表明AllMatch始终优于现有最先进方法。