ProtoPathway: Biologically Structured Prototype-Pathway Fusion for Multimodal Cancer Survival Prediction

We introduce ProtoPathway, an interpretable-by-design multimodal framework for cancer survival prediction that unifies whole slide imaging and transcriptomics through encoders producing biologically grounded representations on both sides of the fusion. On the histopathology side, $K$ learnable morphological prototypes, trained end-to-end with the survival objective, serve as the slide representation itself: patches flow into prototype tokens via soft assignment, compressing variable-length patch sets into fixed task-adaptive tokens. On the genomic side, a bipartite graph neural network encodes gene expression within the Reactome pathway hierarchy, producing pathway embeddings that reflect both constituent genes and their broader biological context through bidirectional message passing over a shared gene--pathway graph. Cross-modal attention then operates over a compact prototype $\times$ pathway matrix in which prototypes query pathways, modeling the biological direction in which molecular programs give rise to tissue morphology. Because both axes carry stable task-learned identity, the attention matrix is itself an interpretability output, yielding native inference-time attribution across the full biological hierarchy, from genes through pathways and prototypes to spatial tissue maps. We evaluate on five TCGA cancer cohorts, demonstrating competitive or superior survival prediction with substantially improved biological interpretability and reduced computational cost, with interpretability claims validated through fold-stratified rank-based population-level analysis. Our source code, model weights, and Reactome pathways, together with a unified codebase reimplementing all multimodal survival baselines under identical preprocessing and evaluation, are available at: https://github.com/AmayaGS/ProtoPathway.

翻译：我们提出ProtoPathway，一种可解释性设计的用于癌症生存预测的多模态框架，通过编码器在融合两端产生具有生物学基础的表征，从而统一全切片成像与转录组学。在组织病理学方面，$K$个可学习的形态学原型（与生存目标进行端到端训练）直接作为切片表征：斑块通过软分配流入原型标记，将可变长度的斑块集合压缩为固定且任务自适应的标记。在基因组方面，一个二部图神经网络对Reactome通路层级中的基因表达进行编码，产生同时反映组成基因及更广泛生物学背景的通路嵌入，通过共享的基因-通路图上的双向消息传递实现。跨模态注意力随后作用于紧凑的原型 $\times$ 通路矩阵，其中原型作为查询项关注通路，模拟分子程序产生组织形态学的生物学方向。由于两个轴均带有稳定的任务学习身份，注意力矩阵本身即为可解释性输出，在完整生物学层级中生成原生推理时间归因——从基因到通路、原型及空间组织图谱。我们在五个TCGA肿瘤队列上进行评估，展示了具有竞争力或更优的生存预测性能、显著提升的生物学可解释性及降低的计算成本，并通过基于折叠分层排名的群体水平分析验证了可解释性声明。我们的源代码、模型权重及Reactome通路，连同在相同预处理和评估条件下重新实现所有多模态生存基线的统一代码库，均可在以下地址获取：https://github.com/AmayaGS/ProtoPathway。