The accurate classification of lymphoma subtypes using hematoxylin and eosin (H&E)-stained tissue is complicated by the wide range of morphological features these cancers can exhibit. We present LymphoML - an interpretable machine learning method that identifies morphologic features that correlate with lymphoma subtypes. Our method applies steps to process H&E-stained tissue microarray cores, segment nuclei and cells, compute features encompassing morphology, texture, and architecture, and train gradient-boosted models to make diagnostic predictions. LymphoML's interpretable models, developed on a limited volume of H&E-stained tissue, achieve non-inferior diagnostic accuracy to pathologists using whole-slide images and outperform black box deep-learning on a dataset of 670 cases from Guatemala spanning 8 lymphoma subtypes. Using SHapley Additive exPlanation (SHAP) analysis, we assess the impact of each feature on model prediction and find that nuclear shape features are most discriminative for DLBCL (F1-score: 78.7%) and classical Hodgkin lymphoma (F1-score: 74.5%). Finally, we provide the first demonstration that a model combining features from H&E-stained tissue with features from a standardized panel of 6 immunostains results in a similar diagnostic accuracy (85.3%) to a 46-stain panel (86.1%).
翻译:利用苏木精-伊红(H&E)染色组织准确分类淋巴瘤亚型,因这类肿瘤可呈现广泛的形态学特征而变得复杂。我们提出LymphoML——一种可解释的机器学习方法,可识别与淋巴瘤亚型相关的形态学特征。该方法通过处理H&E染色组织微阵列核心样本、分割细胞核与细胞、计算涵盖形态、纹理及结构的特征,并训练梯度提升模型进行诊断预测。基于有限H&E染色组织样本开发的LymphoML可解释模型,在使用全切片图像时达到了不逊于病理学家的诊断准确性,并在包含来自危地马拉的670例病例(涵盖8种淋巴瘤亚型)的数据集上优于黑盒深度学习模型。通过SHAP(Shapley加法解释)分析评估各特征对模型预测的影响,发现细胞核形状特征对弥漫性大B细胞淋巴瘤(F1分数:78.7%)和经典霍奇金淋巴瘤(F1分数:74.5%)最具判别力。最后,我们首次证明,结合H&E染色组织特征与标准化6种免疫染色组合特征的模型,其诊断准确性(85.3%)与46种免疫染色组合(86.1%)相当。