The robustness of supervised deep learning-based medical image classification is significantly undermined by label noise. Although several methods have been proposed to enhance classification performance in the presence of noisy labels, they face some challenges: 1) a struggle with class-imbalanced datasets, leading to the frequent overlooking of minority classes as noisy samples; 2) a singular focus on maximizing performance using noisy datasets, without incorporating experts-in-the-loop for actively cleaning the noisy labels. To mitigate these challenges, we propose a two-phase approach that combines Learning with Noisy Labels (LNL) and active learning. This approach not only improves the robustness of medical image classification in the presence of noisy labels, but also iteratively improves the quality of the dataset by relabeling the important incorrect labels, under a limited annotation budget. Furthermore, we introduce a novel Variance of Gradients approach in LNL phase, which complements the loss-based sample selection by also sampling under-represented samples. Using two imbalanced noisy medical classification datasets, we demonstrate that that our proposed technique is superior to its predecessors at handling class imbalance by not misidentifying clean samples from minority classes as mostly noisy samples.
翻译:监督式深度学习医学图像分类的鲁棒性极易受标签噪声影响。尽管已有多种方法被提出以提升含噪标签下的分类性能,但仍面临以下挑战:1)难以处理类别不平衡数据集,导致少数类样本常被误判为噪声样本;2)仅聚焦于利用含噪数据集最大化性能,未能引入专家参与循环以主动清洗噪声标签。为应对这些挑战,我们提出一种融合噪声标签学习与主动学习的两阶段方法。该方法不仅能在噪声标签环境下提升医学图像分类的鲁棒性,还能在有限标注预算下通过重标定重要错误标签来迭代提升数据集质量。此外,我们在噪声标签学习阶段引入一种新颖的梯度方差法,该方法通过同时采样低代表性样本来补充基于损失的样本选择机制。通过在两个不平衡含噪医学分类数据集上的实验,我们证明所提技术能有效处理类别不平衡问题,其优势在于不会将少数类中的干净样本普遍误判为噪声样本,性能优于现有方法。