Class imbalance remains a critical challenge in semi-supervised learning (SSL), especially when distributional mismatches between labeled and unlabeled data lead to biased classification. Although existing methods address this issue by adjusting logits based on the estimated class distribution of unlabeled data, they often handle model imbalance in a coarse-grained manner, conflating data imbalance with bias arising from varying class-specific learning difficulties. To address this issue, we propose a unified framework, SC-SSL, which suppresses model bias through decoupled sampling control. During training, we identify the key variables for sampling control under ideal conditions. By introducing a classifier with explicit expansion capability and adaptively adjusting sampling probabilities across different data distributions, SC-SSL mitigates feature-level imbalance for minority classes. In the inference phase, we further analyze the weight imbalance of the linear classifier and apply post-hoc sampling control with an optimization bias vector to directly calibrate the logits. Extensive experiments across various benchmark datasets and distribution settings validate the consistency and state-of-the-art performance of SC-SSL.
翻译:类别不平衡仍然是半监督学习(SSL)中的一个关键挑战,特别是在标注数据与未标注数据之间的分布不匹配导致分类偏差时。尽管现有方法通过基于未标注数据估计的类别分布调整逻辑值来解决这一问题,但它们通常以粗粒度方式处理模型不平衡,将数据不平衡与因不同类别特定学习难度引起的偏差混为一谈。为解决此问题,我们提出了一个统一框架SC-SSL,该框架通过解耦的采样控制来抑制模型偏差。在训练过程中,我们在理想条件下识别了采样控制的关键变量。通过引入具有显式扩展能力的分类器,并自适应地调整不同数据分布下的采样概率,SC-SSL缓解了少数类在特征层面的不平衡。在推理阶段,我们进一步分析了线性分类器的权重不平衡,并应用带有优化偏差向量的后验采样控制来直接校准逻辑值。在多种基准数据集和分布设置下的大量实验验证了SC-SSL的一致性和最先进的性能。