We address the challenging problem of Long-Tailed Semi-Supervised Learning (LTSSL) where labeled data exhibit imbalanced class distribution and unlabeled data follow an unknown distribution. Unlike in balanced SSL, the generated pseudo-labels are skewed towards head classes, intensifying the training bias. Such a phenomenon is even amplified as more unlabeled data will be mislabeled as head classes when the class distribution of labeled and unlabeled datasets are mismatched. To solve this problem, we propose a novel method named ComPlementary Experts (CPE). Specifically, we train multiple experts to model various class distributions, each of them yielding high-quality pseudo-labels within one form of class distribution. Besides, we introduce Classwise Batch Normalization for CPE to avoid performance degradation caused by feature distribution mismatch between head and non-head classes. CPE achieves state-of-the-art performances on CIFAR-10-LT, CIFAR-100-LT, and STL-10-LT dataset benchmarks. For instance, on CIFAR-10-LT, CPE improves test accuracy by over 2.22% compared to baselines. Code is available at https://github.com/machengcheng2016/CPE-LTSSL.
翻译:我们研究了长尾半监督学习(LTSSL)这一具有挑战性的问题,其中标注数据呈现不均衡的类别分布,而未标注数据遵循未知分布。与均衡半监督学习不同,生成伪标签会偏向头部类别,从而加剧训练偏差。当标注数据集与未标注数据集的类别分布不匹配时,这种偏差还会进一步放大:更多未标注数据会被错误地标记为头部类别。为解决此问题,我们提出了一种名为互补专家(CPE)的新方法。具体而言,我们训练多个专家模型来模拟不同的类别分布,每个专家能在一种类别分布形式下生成高质量的伪标签。此外,我们为CPE引入了类别级批归一化,以避免头部类别与非头部类别之间的特征分布失配导致的性能下降。在CIFAR-10-LT、CIFAR-100-LT和STL-10-LT数据集基准上,CPE取得了最先进的性能。例如,在CIFAR-10-LT上,CPE相比基线方法将测试准确率提升了超过2.22%。代码已开源:https://github.com/machengcheng2016/CPE-LTSSL。