Novel Pipeline for Diagnosing Acute Lymphoblastic Leukemia Sensitive to Related Biomarkers

Acute Lymphoblastic Leukemia (ALL) is one of the most common types of childhood blood cancer. The quick start of the treatment process is critical to saving the patient's life, and for this reason, early diagnosis of this disease is essential. Examining the blood smear images of these patients is one of the methods used by expert doctors to diagnose this disease. Deep learning-based methods have numerous applications in medical fields, as they have significantly advanced in recent years. ALL diagnosis is not an exception in this field, and several machine learning-based methods for this problem have been proposed. In previous methods, high diagnostic accuracy was reported, but our work showed that this alone is not sufficient, as it can lead to models taking shortcuts and not making meaningful decisions. This issue arises due to the small size of medical training datasets. To address this, we constrained our model to follow a pipeline inspired by experts' work. We also demonstrated that, since a judgement based on only one image is insufficient, redefining the problem as a multiple-instance learning problem is necessary for achieving a practical result. Our model is the first to provide a solution to this problem in a multiple-instance learning setup. We introduced a novel pipeline for diagnosing ALL that approximates the process used by hematologists, is sensitive to disease biomarkers, and achieves an accuracy of 96.15%, an F1-score of 94.24%, a sensitivity of 97.56%, and a specificity of 90.91% on ALL IDB 1. Our method was further evaluated on an out-of-distribution dataset, which posed a challenging test and had acceptable performance. Notably, our model was trained on a relatively small dataset, highlighting the potential for our approach to be applied to other medical datasets with limited data availability.

翻译：急性淋巴细胞白血病（ALL）是儿童最常见的血癌类型之一。快速启动治疗过程对挽救患者生命至关重要，因此该疾病的早期诊断具有关键意义。检查这些患者的血涂片影像是专家医生诊断该疾病的方法之一。基于深度学习的方法在医学领域具有众多应用，近年来取得了显著进展。ALL诊断在该领域也不例外，目前已提出多种基于机器学习的方法。先前的研究报告了高诊断准确率，但我们的研究表明，仅凭高准确率并不足够，因为这可能导致模型走捷径而无法做出有意义的决策。该问题源于医学训练数据集规模较小。为解决这一问题，我们约束模型遵循专家工作启发式的流水线。我们还证明，由于仅基于单张图像的判断不够充分，为实现实用结果必须将问题重新定义为多实例学习问题。我们的模型是首个在多实例学习框架下为该问题提供解决方案的工作。我们引入了一种新型ALL诊断流水线，该流水线近似血液学家的工作流程，对疾病生物标志物敏感，并在ALL IDB 1数据集上实现了96.15%的准确率、94.24%的F1分数、97.56%的灵敏度和90.91%的特异性。该方法进一步在分布外数据集上进行了评估，该测试具有挑战性且表现可接受。值得注意的是，我们的模型是在相对较小的数据集上训练的，这凸显了该方法适用于其他数据有限的医学数据集的应用潜力。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日