Emerging research has highlighted that artificial intelligence-based multimodal fusion of digital pathology and transcriptomic features can improve cancer diagnosis (grading/subtyping) and prognosis (survival risk) prediction. However, such direct fusion is impractical in clinical settings, where histopathology remains the gold standard and transcriptomic tests are rarely requested in public healthcare. We experiment on two publicly available multimodal datasets, The Cancer Genomic Atlas and the Clinical Proteomic Tumor Analysis Consortium, spanning four independent cohorts: glioma-glioblastoma, renal, uterine, and breast, and observe significant performance gains in gradation and risk estimation (p-value<0.05) when incorporating synthesized transcriptomic data with WSIs. Also, predictions using synthesized features were statistically close to those obtained with real transcriptomic data (p-value>0.05), consistently across cohorts. Here we show that with our diffusion based crossmodal generative AI model, PathGen, gene expressions synthesized from digital histopathology jointly predict cancer grading and patient survival risk with high accuracy (state-of-the-art performance), certainty (through conformal coverage guarantee) and interpretability (through distributed co-attention maps). PathGen code is available for open use on GitHub at https://github.com/Samiran-Dey/PathGen.
翻译:新兴研究表明,基于人工智能的数字病理学与转录组学特征的多模态融合能够改善癌症诊断(分级/亚型分型)和预后(生存风险)预测。然而,这种直接融合在临床环境中并不实用,因为组织病理学仍是金标准,而在公共医疗体系中转录组学检测很少被要求进行。我们在两个公开可用的多模态数据集——癌症基因组图谱和临床蛋白质组肿瘤分析联盟——上进行了实验,涵盖四个独立队列:胶质瘤-胶质母细胞瘤、肾癌、子宫癌和乳腺癌,并观察到当将合成的转录组学数据与全切片图像结合时,在分级和风险估计方面取得了显著的性能提升(p值<0.05)。此外,使用合成特征进行的预测在统计上接近使用真实转录组数据获得的结果(p值>0.05),这一结论在各队列中保持一致。本文中,我们展示了通过基于扩散的跨模态生成式人工智能模型PathGen,从数字组织病理学合成的基因表达能够以高精度(达到最先进性能)、高确定性(通过保形覆盖保证)和高可解释性(通过分布式协同注意力图)联合预测癌症分级和患者生存风险。PathGen代码已在GitHub上开源,可供使用:https://github.com/Samiran-Dey/PathGen。