Accurate morphological classification of white blood cells (WBCs) is an important step in the diagnosis of leukemia, a disease in which nonfunctional blast cells accumulate in the bone marrow. Recently, deep convolutional neural networks (CNNs) have been successfully used to classify leukocytes by training them on single-cell images from a specific domain. Most CNN models assume that the distributions of the training and test data are similar, i.e., the data are independently and identically distributed. Therefore, they are not robust to different staining procedures, magnifications, resolutions, scanners, or imaging protocols, as well as variations in clinical centers or patient cohorts. In addition, domain-specific data imbalances affect the generalization performance of classifiers. Here, we train a robust CNN for WBC classification by addressing cross-domain data imbalance and domain shifts. To this end, we use two loss functions and demonstrate their effectiveness in out-of-distribution (OOD) generalization. Our approach achieves the best F1 macro score compared to other existing methods and is able to consider rare cell types. This is the first demonstration of imbalanced domain generalization in hematological cytomorphology and paves the way for robust single cell classification methods for the application in laboratories and clinics.
翻译:白细胞(WBCs)的精确形态学分类是白血病诊断的关键步骤,这种疾病会导致非功能性原始细胞在骨髓中积聚。近年来,深度卷积神经网络(CNNs)已成功应用于通过特定域的单细胞图像训练来对白细胞进行分类。大多数CNN模型假设训练数据和测试数据的分布相似,即数据是独立同分布的。因此,它们对不同染色程序、放大倍数、分辨率、扫描仪或成像协议,以及临床中心或患者群体的差异缺乏鲁棒性。此外,域特定的数据不平衡会影响分类器的泛化性能。本文通过解决跨域数据不平衡和域偏移问题,训练了一个用于白细胞分类的稳健CNN。为此,我们采用了两种损失函数,并证明了它们在分布外(OOD)泛化中的有效性。与现有方法相比,我们的方法取得了最佳的F1宏平均分数,并且能够处理稀有细胞类型。这是首次在血液细胞形态学中展示不平衡域泛化,为在实验室和临床中应用的稳健单细胞分类方法铺平了道路。