Treatment approaches for colorectal cancer (CRC) are highly dependent on the molecular subtype, as immunotherapy has shown efficacy in cases with microsatellite instability (MSI) but is ineffective for the microsatellite stable (MSS) subtype. There is promising potential in utilizing deep neural networks (DNNs) to automate the differentiation of CRC subtypes by analyzing Hematoxylin and Eosin (H\&E) stained whole-slide images (WSIs). Due to the extensive size of WSIs, Multiple Instance Learning (MIL) techniques are typically explored. However, existing MIL methods focus on identifying the most representative image patches for classification, which may result in the loss of critical information. Additionally, these methods often overlook clinically relevant information, like the tendency for MSI class tumors to predominantly occur on the proximal (right side) colon. We introduce `CIMIL-CRC', a DNN framework that: 1) solves the MSI/MSS MIL problem by efficiently combining a pre-trained feature extraction model with principal component analysis (PCA) to aggregate information from all patches, and 2) integrates clinical priors, particularly the tumor location within the colon, into the model to enhance patient-level classification accuracy. We assessed our CIMIL-CRC method using the average area under the curve (AUC) from a 5-fold cross-validation experimental setup for model development on the TCGA-CRC-DX cohort, contrasting it with a baseline patch-level classification, MIL-only approach, and Clinically-informed patch-level classification approach. Our CIMIL-CRC outperformed all methods (AUROC: $0.92\pm0.002$ (95\% CI 0.91-0.92), vs. $0.79\pm0.02$ (95\% CI 0.76-0.82), $0.86\pm0.01$ (95\% CI 0.85-0.88), and $0.87\pm0.01$ (95\% CI 0.86-0.88), respectively). The improvement was statistically significant.
翻译:结直肠癌(CRC)的治疗方案高度依赖于分子亚型,因为免疫疗法在微卫星不稳定(MSI)病例中显示出疗效,但对微卫星稳定(MSS)亚型无效。利用深度神经网络(DNN)通过分析苏木精-伊红(H&E)染色的全切片图像(WSI)来自动区分CRC亚型具有广阔潜力。由于WSI尺寸庞大,通常采用多实例学习(MIL)技术。然而,现有MIL方法侧重于识别最具代表性的图像块进行分类,可能导致关键信息丢失。此外,这些方法常忽略临床相关信息,例如MSI类肿瘤主要发生于近端(右侧)结肠的倾向。我们提出`CIMIL-CRC`,一种DNN框架,其:1)通过高效结合预训练特征提取模型与主成分分析(PCA)聚合所有图像块信息,解决MSI/MSS的MIL问题;2)将临床先验知识(特别是结肠内肿瘤位置)整合到模型中,以提升患者级别分类准确性。我们采用5折交叉验证实验设计,在TCGA-CRC-DX队列上开发模型,以平均曲线下面积(AUC)评估CIMIL-CRC方法,并将其与基线图像块级别分类、纯MIL方法以及基于临床信息的图像块级别分类方法进行对比。我们的CIMIL-CRC方法优于所有对比方法(AUROC:$0.92\pm0.002$(95% CI 0.91-0.92),对比 $0.79\pm0.02$(95% CI 0.76-0.82)、$0.86\pm0.01$(95% CI 0.85-0.88)和 $0.87\pm0.01$(95% CI 0.86-0.88)),且改进具有统计学显著性。