Long-tailed semi-supervised learning (LTSSL) represents a practical scenario for semi-supervised applications, challenged by skewed labeled distributions that bias classifiers. This problem is often aggravated by discrepancies between labeled and unlabeled class distributions, leading to biased pseudo-labels, neglect of rare classes, and poorly calibrated probabilities. To address these issues, we introduce Flexible Distribution Alignment (FlexDA), a novel adaptive logit-adjusted loss framework designed to dynamically estimate and align predictions with the actual distribution of unlabeled data and achieve a balanced classifier by the end of training. FlexDA is further enhanced by a distillation-based consistency loss, promoting fair data usage across classes and effectively leveraging underconfident samples. This method, encapsulated in ADELLO (Align and Distill Everything All at Once), proves robust against label shift, significantly improves model calibration in LTSSL contexts, and surpasses previous state-of-of-art approaches across multiple benchmarks, including CIFAR100-LT, STL10-LT, and ImageNet127, addressing class imbalance challenges in semi-supervised learning. Our code will be made available upon paper acceptance.
翻译:长尾半监督学习(LTSSL)是半监督应用中的实际场景,其面临标注分布偏斜导致分类器偏差的挑战。该问题常因标注数据与未标注数据的类别分布不一致而加剧,引发伪标签偏差、稀有类别被忽视以及概率校准不良等问题。为解决这些难题,我们提出灵活分布对齐(FlexDA)——一种新型自适应对数赔率调整损失框架,旨在动态估计预测结果并使其与未标注数据的实际分布对齐,从而在训练结束时实现平衡分类器。FlexDA通过基于蒸馏的一致性损失进一步增强,促进跨类别的公平数据利用,并有效利用低置信度样本。该方法集成于ADELLO(即时对齐与联合蒸馏)框架中,对标签偏移具有鲁棒性,显著提升了LTSSL场景下的模型校准能力,并在CIFAR100-LT、STL10-LT和ImageNet127等多个基准测试中超越现有最优方法,解决了半监督学习中的类别不平衡挑战。我们的代码将在论文被接收后公开。