Artificial Intelligence (AI) in healthcare, especially in white blood cell cancer diagnosis, is hindered by two primary challenges: the lack of large-scale labeled datasets for white blood cell (WBC) segmentation and outdated segmentation methods. These challenges inhibit the development of more accurate and modern techniques to diagnose cancer relating to white blood cells. To address the first challenge, a semi-supervised learning framework should be devised to efficiently capitalize on the scarcity of the dataset available. In this work, we address this issue by proposing a novel self-training pipeline with the incorporation of FixMatch. Self-training is a technique that utilizes the model trained on labeled data to generate pseudo-labels for the unlabeled data and then re-train on both of them. FixMatch is a consistency-regularization algorithm to enforce the model's robustness against variations in the input image. We discover that by incorporating FixMatch in the self-training pipeline, the performance improves in the majority of cases. Our performance achieved the best performance with the self-training scheme with consistency on DeepLab-V3 architecture and ResNet-50, reaching 90.69%, 87.37%, and 76.49% on Zheng 1, Zheng 2, and LISC datasets, respectively.
翻译:医疗领域的人工智能,尤其是白细胞癌症诊断,面临两大挑战:缺乏大规模标注的白细胞分割数据集,以及分割方法的落后。这些挑战阻碍了更精准、更现代的白细胞相关癌症诊断技术的发展。为应对第一个挑战,需设计半监督学习框架,以高效利用有限的数据集。本文提出一种新颖的自训练管道,并融入FixMatch算法。自训练技术利用在标注数据上训练的模型为未标注数据生成伪标签,再对两者进行联合训练;而FixMatch作为一种一致性正则化算法,可增强模型对输入图像变化的鲁棒性。研究发现,在自训练管道中融入FixMatch后,多数情况下模型性能得到提升。采用一致性自训练方案,在DeepLab-V3架构与ResNet-50骨干网络上,分别在Zheng 1、Zheng 2和LISC数据集上达到90.69%、87.37%和76.49%的最佳性能。