Long-tailed semi-supervised learning (LTSSL) represents a practical scenario for semi-supervised applications, challenged by skewed labeled distributions that bias classifiers. This problem is often aggravated by discrepancies between labeled and unlabeled class distributions, leading to biased pseudo-labels, neglect of rare classes, and poorly calibrated probabilities. To address these issues, we introduce Flexible Distribution Alignment (FlexDA), a novel adaptive logit-adjusted loss framework designed to dynamically estimate and align predictions with the actual distribution of unlabeled data and achieve a balanced classifier by the end of training. FlexDA is further enhanced by a distillation-based consistency loss, promoting fair data usage across classes and effectively leveraging underconfident samples. This method, encapsulated in ADELLO (Align and Distill Everything All at Once), proves robust against label shift, significantly improves model calibration in LTSSL contexts, and surpasses previous state-of-of-art approaches across multiple benchmarks, including CIFAR100-LT, STL10-LT, and ImageNet127, addressing class imbalance challenges in semi-supervised learning. Our code is available at https://github.com/emasa/ADELLO-LTSSL.
翻译:长尾半监督学习(LTSSL)代表了半监督应用中的一个实际场景,其挑战在于有偏的标记分布会使分类器产生偏差。标记数据与未标记数据的类别分布之间的差异往往会加剧这一问题,导致伪标签有偏、忽视稀有类别以及概率校准不佳。为解决这些问题,我们提出了柔性分布对齐(FlexDA),这是一种新颖的自适应对数调整损失框架,旨在动态估计预测结果并将其与未标记数据的实际分布对齐,从而在训练结束时获得一个平衡的分类器。FlexDA通过一种基于蒸馏的一致性损失得到进一步增强,该损失促进了跨类别的公平数据利用,并有效利用了置信度不足的样本。这一方法被封装在ADELLO(一次性对齐与蒸馏所有内容)中,被证明对标签偏移具有鲁棒性,显著改善了LTSSL背景下的模型校准,并在包括CIFAR100-LT、STL10-LT和ImageNet127在内的多个基准测试中超越了先前的最先进方法,解决了半监督学习中的类别不平衡挑战。我们的代码可在 https://github.com/emasa/ADELLO-LTSSL 获取。