Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction

Integrating whole-slide images (WSIs) and bulk transcriptomics for predicting patient survival can improve our understanding of patient prognosis. However, this multimodal task is particularly challenging due to the different nature of these data: WSIs represent a very high-dimensional spatial description of a tumor, while bulk transcriptomics represent a global description of gene expression levels within that tumor. In this context, our work aims to address two key challenges: (1) how can we tokenize transcriptomics in a semantically meaningful and interpretable way?, and (2) how can we capture dense multimodal interactions between these two modalities? Specifically, we propose to learn biological pathway tokens from transcriptomics that can encode specific cellular functions. Together with histology patch tokens that encode the different morphological patterns in the WSI, we argue that they form appropriate reasoning units for downstream interpretability analyses. We propose fusing both modalities using a memory-efficient multimodal Transformer that can model interactions between pathway and histology patch tokens. Our proposed model, SURVPATH, achieves state-of-the-art performance when evaluated against both unimodal and multimodal baselines on five datasets from The Cancer Genome Atlas. Our interpretability framework identifies key multimodal prognostic factors, and, as such, can provide valuable insights into the interaction between genotype and phenotype, enabling a deeper understanding of the underlying biological mechanisms at play. We make our code public at: https://github.com/ajv012/SurvPath.

翻译：整合全切片图像（WSIs）和批量转录组学数据以预测患者生存，能够提升我们对患者预后的理解。然而，由于这些数据的不同性质，这一多模态任务尤其具有挑战性：WSI代表了肿瘤的高维空间描述，而批量转录组学则代表了肿瘤内基因表达水平的全局描述。在此背景下，我们的工作旨在解决两个关键挑战：（1）如何以语义有意义且可解释的方式对转录组学进行令牌化？（2）如何捕捉这两种模态之间的密集多模态交互？具体而言，我们提出从转录组学中学习生物通路令牌，这些令牌能够编码特定的细胞功能。结合编码WSI中不同形态模式的组织学斑块令牌，我们认为它们构成了适用于下游可解释性分析的推理单元。我们提出使用一种内存高效的多模态Transformer来融合这两种模态，该模型能够建模通路与组织学斑块令牌之间的交互。我们提出的模型SURVPATH在来自癌症基因组图谱的五个数据集上，与单模态和多模态基线模型相比，实现了最先进的性能。我们的可解释性框架识别出关键的多模态预后因素，因此能够为基因型与表型之间的交互提供宝贵见解，从而加深对潜在生物学机制的理解。我们已将代码公开于：https://github.com/ajv012/SurvPath。