Medical image classification is a challenging task due to the scarcity of labeled samples and class imbalance caused by the high variance in disease prevalence. Semi-supervised learning (SSL) methods can mitigate these challenges by leveraging both labeled and unlabeled data. However, SSL methods for medical image classification need to address two key challenges: (1) estimating reliable pseudo-labels for the images in the unlabeled dataset and (2) reducing biases caused by class imbalance. In this paper, we propose a novel SSL approach, SPLAL, that effectively addresses these challenges. SPLAL leverages class prototypes and a weighted combination of classifiers to predict reliable pseudo-labels over a subset of unlabeled images. Additionally, we introduce alignment loss to mitigate model biases toward majority classes. To evaluate the performance of our proposed approach, we conduct experiments on two publicly available medical image classification benchmark datasets: the skin lesion classification (ISIC 2018) and the blood cell classification dataset (BCCD). The experimental results empirically demonstrate that our approach outperforms several state-of-the-art SSL methods over various evaluation metrics. Specifically, our proposed approach achieves a significant improvement over the state-of-the-art approach on the ISIC 2018 dataset in both Accuracy and F1 score, with relative margins of 2.24\% and 11.40\%, respectively. Finally, we conduct extensive ablation experiments to examine the contribution of different components of our approach, validating its effectiveness.
翻译:医学图像分类因标注样本稀缺以及疾病患病率高度变异导致的类别不平衡而面临挑战。半监督学习(SSL)方法可通过利用标注和未标注数据缓解这些问题,但应用于医学图像分类时仍需解决两个关键挑战:(1)为未标注数据集中的图像估计可靠的伪标签;(2)减少类别不平衡引起的偏差。本文提出一种新型SSL方法SPLAL,有效应对上述挑战。该方法利用类别原型和分类器的加权组合,为未标注图像子集预测可靠的伪标签,并引入对齐损失以减轻模型对多数类的偏好。为评估性能,我们在两个公开医学图像分类基准数据集——皮肤病变分类(ISIC 2018)和血细胞分类(BCCD)上进行实验。实验结果表明,我们的方法在多个评估指标上优于多种现有最优SSL方法。具体而言,在ISIC 2018数据集上,所提方法在准确率和F1分数上分别以2.24%和11.40%的相对提升幅度显著超越当前最优方法。最后,我们通过大量消融实验验证了方法各组成部分的贡献,证实了其有效性。