Accurate morphological classification of white blood cells (WBCs) is an important step in the diagnosis of leukemia, a disease in which nonfunctional blast cells accumulate in the bone marrow. Recently, deep convolutional neural networks (CNNs) have been successfully used to classify leukocytes by training them on single-cell images from a specific domain. Most CNN models assume that the distributions of the training and test data are similar, i.e., the data are independently and identically distributed. Therefore, they are not robust to different staining procedures, magnifications, resolutions, scanners, or imaging protocols, as well as variations in clinical centers or patient cohorts. In addition, domain-specific data imbalances affect the generalization performance of classifiers. Here, we train a robust CNN for WBC classification by addressing cross-domain data imbalance and domain shifts. To this end, we use two loss functions and demonstrate their effectiveness in out-of-distribution (OOD) generalization. Our approach achieves the best F1 macro score compared to other existing methods and is able to consider rare cell types. This is the first demonstration of imbalanced domain generalization in hematological cytomorphology and paves the way for robust single cell classification methods for the application in laboratories and clinics.
翻译:白细胞(WBCs)的准确形态学分类是白血病诊断的重要环节,该疾病中功能缺失的母细胞在骨髓中积聚。近年来,深度卷积神经网络(CNNs)已成功用于通过在特定领域内训练单细胞图像来对白细胞进行分类。大多数CNN模型假设训练数据和测试数据的分布相似,即数据独立同分布。因此,它们对不同染色方法、放大倍数、分辨率、扫描仪或成像协议,以及临床中心或患者群体的变化缺乏鲁棒性。此外,领域特异性的数据不平衡会影响分类器的泛化性能。本文通过解决跨域数据不平衡和领域偏移问题,训练了一个稳健的CNN用于白细胞分类。为此,我们使用了两种损失函数,并证明了它们在分布外(OOD)泛化中的有效性。与现有其他方法相比,我们的方法取得了最佳的F1宏观分数,并能够考虑稀有细胞类型。这是血液细胞形态学中非平衡域泛化的首次展示,为在实验室和临床中应用的稳健单细胞分类方法铺平了道路。