The rapidly emerging field of deep learning-based computational pathology has shown promising results in utilizing whole slide images (WSIs) to objectively prognosticate cancer patients. However, most prognostic methods are currently limited to either histopathology or genomics alone, which inevitably reduces their potential to accurately predict patient prognosis. Whereas integrating WSIs and genomic features presents three main challenges: (1) the enormous heterogeneity of gigapixel WSIs which can reach sizes as large as 150,000x150,000 pixels; (2) the absence of a spatially corresponding relationship between histopathology images and genomic molecular data; and (3) the existing early, late, and intermediate multimodal feature fusion strategies struggle to capture the explicit interactions between WSIs and genomics. To ameliorate these issues, we propose the Mutual-Guided Cross-Modality Transformer (MGCT), a weakly-supervised, attention-based multimodal learning framework that can combine histology features and genomic features to model the genotype-phenotype interactions within the tumor microenvironment. To validate the effectiveness of MGCT, we conduct experiments using nearly 3,600 gigapixel WSIs across five different cancer types sourced from The Cancer Genome Atlas (TCGA). Extensive experimental results consistently emphasize that MGCT outperforms the state-of-the-art (SOTA) methods.
翻译:基于深度学习的计算病理学这一新兴领域,在利用全切片图像客观预测癌症患者预后方面已展现出可喜成果。然而,当前多数预后方法仅局限于组织病理学或基因组学单一模态,这不可避免地削弱了其准确预测患者预后的潜力。整合WSI与基因组特征面临三大挑战:(1) 十亿像素级WSI的极端异质性(图像尺寸可达150,000×150,000像素);(2) 组织病理图像与基因组分子数据缺乏空间对应关系;(3) 现有早期融合、晚期融合及中间融合等多模态特征融合策略难以捕捉WSI与基因组之间的显式交互。针对上述问题,我们提出互导跨模态Transformer(MGCT)——一种基于注意力机制的弱监督多模态学习框架,能够融合组织学特征与基因组特征,对肿瘤微环境中的基因型-表型相互作用进行建模。为验证MGCT的有效性,我们使用来自癌症基因组图谱(TCGA)的5种癌症类型近3,600张十亿像素级WSI开展实验。大量实验结果表明,MGCT始终优于现有最优方法。