Variation in nuclear size and shape is an important criterion of malignancy for many tumor types; however, categorical estimates by pathologists have poor reproducibility. Measurements of nuclear characteristics (morphometry) can improve reproducibility, but manual methods are time consuming. In this study, we evaluated fully automated morphometry using a deep learning-based algorithm in 96 canine cutaneous mast cell tumors with information on patient survival. Algorithmic morphometry was compared with karyomegaly estimates by 11 pathologists, manual nuclear morphometry of 12 cells by 9 pathologists, and the mitotic count as a benchmark. The prognostic value of automated morphometry was high with an area under the ROC curve regarding the tumor-specific survival of 0.943 (95% CI: 0.889 - 0.996) for the standard deviation (SD) of nuclear area, which was higher than manual morphometry of all pathologists combined (0.868, 95% CI: 0.737 - 0.991) and the mitotic count (0.885, 95% CI: 0.765 - 1.00). At the proposed thresholds, the hazard ratio for algorithmic morphometry (SD of nuclear area $\geq 9.0 \mu m^2$) was 18.3 (95% CI: 5.0 - 67.1), for manual morphometry (SD of nuclear area $\geq 10.9 \mu m^2$) 9.0 (95% CI: 6.0 - 13.4), for karyomegaly estimates 7.6 (95% CI: 5.7 - 10.1), and for the mitotic count 30.5 (95% CI: 7.8 - 118.0). Inter-rater reproducibility for karyomegaly estimates was fair ($\kappa$ = 0.226) with highly variable sensitivity/specificity values for the individual pathologists. Reproducibility for manual morphometry (SD of nuclear area) was good (ICC = 0.654). This study supports the use of algorithmic morphometry as a prognostic test to overcome the limitations of estimates and manual measurements.
翻译:细胞核大小和形状的变异是多种肿瘤类型恶性的重要评判标准,然而病理学家进行的分类估计可重复性较差。细胞核特征(形态测量)的量化可提升可重复性,但人工方法耗时费力。本研究采用基于深度学习算法的全自动形态测量法,对96例具有患者生存信息的犬皮肤肥大细胞肿瘤进行评估,并将算法形态测量结果与11位病理学家的核大症估计、9位病理学家对12个细胞的细胞核人工形态测量结果及有丝分裂计数基准进行比较。自动形态测量的预后价值较高,其核面积标准差针对肿瘤特异性生存率的ROC曲线下面积为0.943(95%置信区间:0.889-0.996),高于全体病理学家联合人工形态测量结果的0.868(95%置信区间:0.737-0.991)及有丝分裂计数的0.885(95%置信区间:0.765-1.00)。在建议阈值下,算法形态测量(核面积标准差≥9.0 μm²)的风险比为18.3(95%置信区间:5.0-67.1),人工形态测量(核面积标准差≥10.9 μm²)为9.0(95%置信区间:6.0-13.4),核大症估计为7.6(95%置信区间:5.7-10.1),有丝分裂计数为30.5(95%置信区间:7.8-118.0)。核大症估计的评分者间可重复性为一般(κ=0.226),且各病理学家的敏感度/特异度值差异显著。人工形态测量(核面积标准差)的可重复性良好(ICC=0.654)。本研究支持将算法形态测量作为预后检测手段,以克服估计法和人工测量的局限性。