The accurate classification of lymphoma subtypes using hematoxylin and eosin (H&E)-stained tissue is complicated by the wide range of morphological features these cancers can exhibit. We present LymphoML - an interpretable machine learning method that identifies morphologic features that correlate with lymphoma subtypes. Our method applies steps to process H&E-stained tissue microarray cores, segment nuclei and cells, compute features encompassing morphology, texture, and architecture, and train gradient-boosted models to make diagnostic predictions. LymphoML's interpretable models, developed on a limited volume of H&E-stained tissue, achieve non-inferior diagnostic accuracy to pathologists using whole-slide images and outperform black box deep-learning on a dataset of 670 cases from Guatemala spanning 8 lymphoma subtypes. Using SHapley Additive exPlanation (SHAP) analysis, we assess the impact of each feature on model prediction and find that nuclear shape features are most discriminative for DLBCL (F1-score: 78.7%) and classical Hodgkin lymphoma (F1-score: 74.5%). Finally, we provide the first demonstration that a model combining features from H&E-stained tissue with features from a standardized panel of 6 immunostains results in a similar diagnostic accuracy (85.3%) to a 46-stain panel (86.1%).
翻译:利用苏木精-伊红(H&E)染色组织对淋巴瘤亚型进行准确分类,因这些癌症可呈现的广泛形态学特征而变得复杂。我们提出LymphoML——一种可解释的机器学习方法,用于识别与淋巴瘤亚型相关的形态学特征。该方法包含以下步骤:处理H&E染色组织微阵列核心、分割细胞核与细胞、计算涵盖形态学、纹理及结构特征,并训练梯度提升模型以进行诊断预测。LymphoML基于有限体积H&E染色组织开发的可解释模型,在使用全切片图像的病理学家诊断准确率上达到非劣效性,并在涵盖危地马拉670例病例(包含8种淋巴瘤亚型)的数据集上优于黑盒深度学习方法。通过SHAP分析评估各特征对模型预测的影响,我们发现核形状特征对弥漫性大B细胞淋巴瘤(F1分数:78.7%)和经典霍奇金淋巴瘤(F1分数:74.5%)最具判别力。最后,我们首次证明:结合H&E染色组织特征与标准化6种免疫染色面板特征构建的模型,其诊断准确率(85.3%)与使用46种免疫染色面板的模型(86.1%)相当。