While multimodal survival prediction models are increasingly more accurate, their complexity often reduces interpretability, limiting insight into how different data sources influence predictions. To address this, we introduce DIMAFx, an explainable multimodal framework for cancer survival prediction that produces disentangled, interpretable modality-specific and modality-shared representations from histopathology whole-slide images and transcriptomics data. Across multiple cancer cohorts, DIMAFx achieves state-of-the-art performance and improved representation disentanglement. Leveraging its interpretable design and SHapley Additive exPlanations, DIMAFx systematically reveals key multimodal interactions and the biological information encoded in the disentangled representations. In breast cancer survival prediction, the most predictive features contain modality-shared information, including one capturing solid tumor morphology contextualized primarily by late estrogen response, where higher-grade morphology aligned with pathway upregulation and increased risk, consistent with known breast cancer biology. Key modality-specific features capture microenvironmental signals from interacting adipose and stromal morphologies. These results show that multimodal models can overcome the traditional trade-off between performance and explainability, supporting their application in precision medicine.
翻译:尽管多模态生存预测模型的准确性日益提高,但其复杂性往往降低了可解释性,限制了我们对不同数据源如何影响预测的深入理解。为解决这一问题,我们提出了DIMAFx,一个用于癌症生存预测的可解释多模态框架。该框架能够从组织病理学全切片图像和转录组学数据中生成解耦的、可解释的模态特定与模态共享表征。在多个癌症队列中,DIMAFx实现了最先进的性能并提升了表征解耦度。利用其可解释的设计和SHapley加性解释方法,DIMAFx系统地揭示了关键的多模态交互作用以及编码在解耦表征中的生物学信息。在乳腺癌生存预测中,最具预测性的特征包含模态共享信息,其中一个特征捕获了主要由晚期雌激素反应所情境化的实体肿瘤形态学信息,其中更高级别的形态学与通路上调及风险增加相关,这与已知的乳腺癌生物学特性一致。关键的模态特定特征则捕获了来自相互作用的脂肪和基质形态学的微环境信号。这些结果表明,多模态模型能够克服传统上性能与可解释性之间的权衡,支持其在精准医学中的应用。