Recognizing the types of white blood cells (WBCs) in microscopic images of human blood smears is a fundamental task in the fields of pathology and hematology. Although previous studies have made significant contributions to the development of methods and datasets, few papers have investigated benchmarks or baselines that others can easily refer to. For instance, we observed notable variations in the reported accuracies of the same Convolutional Neural Network (CNN) model across different studies, yet no public implementation exists to reproduce these results. In this paper, we establish a benchmark for WBC recognition. Our results indicate that CNN-based models achieve high accuracy when trained and tested under similar imaging conditions. However, their performance drops significantly when tested under different conditions. Moreover, the ResNet classifier, which has been widely employed in previous work, exhibits an unreasonably poor generalization ability under domain shifts due to batch normalization. We investigate this issue and suggest some alternative normalization techniques that can mitigate it. We make fully-reproducible code publicly available\footnote{\url{https://github.com/apple2373/wbc-benchmark}}.
翻译:识别人类血液涂片显微镜图像中的白细胞类型是病理学和血液学领域的基础任务。尽管先前的研究为方法和数据集的发展做出了重要贡献,但很少有论文探讨其他人可以轻松参考的基准或基线。例如,我们观察到相同卷积神经网络模型在不同研究中报告的准确率存在显著差异,但缺乏公开实现来复现这些结果。本文建立了白细胞识别的基准。我们的结果表明,当在相似成像条件下训练和测试时,基于卷积神经网络的模型能够实现高准确率。然而,在相异条件下测试时,其性能显著下降。此外,先前工作中广泛使用的ResNet分类器,由于批归一化,在领域偏移下表现出了不合理的低泛化能力。我们研究了这一问题,并提出了一些替代归一化技术来缓解它。我们公开了完全可复现的代码\footnote{\url{https://github.com/apple2373/wbc-benchmark}}。