Interpretable Vision-Language Survival Analysis with Ordinal Inductive Bias for Computational Pathology

Histopathology Whole-Slide Images (WSIs) provide an important tool to assess cancer prognosis in computational pathology (CPATH). While existing survival analysis (SA) approaches have made exciting progress, they are generally limited to adopting highly-expressive architectures and only coarse-grained patient-level labels to learn prognostic visual representations from gigapixel WSIs. Such learning paradigm suffers from important performance bottlenecks, when facing present scarce training data and standard multi-instance learning (MIL) framework in CPATH. To overcome it, this paper, for the first time, proposes a new Vision-Language-based SA (VLSA) paradigm. Concretely, (1) VLSA is driven by pathology VL foundation models. It no longer relies on high-capability networks and shows the advantage of data efficiency. (2) In vision-end, VLSA encodes prognostic language prior and then employs it as auxiliary signals to guide the aggregating of prognostic visual features at instance level, thereby compensating for the weak supervision in MIL. Moreover, given the characteristics of SA, we propose i) ordinal survival prompt learning to transform continuous survival labels into textual prompts; and ii) ordinal incidence function as prediction target to make SA compatible with VL-based prediction. Notably, VLSA's predictions can be interpreted intuitively by our Shapley values-based method. The extensive experiments on five datasets confirm the effectiveness of our scheme. Our VLSA could pave a new way for SA in CPATH by offering weakly-supervised MIL an effective means to learn valuable prognostic clues from gigapixel WSIs. Our source code is available at https://github.com/liupei101/VLSA.

翻译：组织病理学全切片图像（WSI）为计算病理学（CPATH）中的癌症预后评估提供了重要工具。尽管现有的生存分析（SA）方法已取得显著进展，但它们通常局限于采用高表达能力架构和仅使用粗粒度患者级标签，从千兆像素级WSI中学习预后视觉表征。当面临当前CPATH中训练数据稀缺和标准多实例学习（MIL）框架时，这种学习范式存在重要的性能瓶颈。为克服此问题，本文首次提出了一种新的基于视觉语言的生存分析（VLSA）范式。具体而言，（1）VLSA由病理学视觉语言基础模型驱动，不再依赖高容量网络，并展现出数据效率优势。（2）在视觉端，VLSA编码预后语言先验，并将其作为辅助信号来指导实例级预后视觉特征的聚合，从而补偿MIL中的弱监督。此外，针对SA的特性，我们提出：i）序数生存提示学习，将连续生存标签转化为文本提示；ii）序数风险函数作为预测目标，使SA与基于视觉语言的预测相兼容。值得注意的是，VLSA的预测可通过我们基于Shapley值的方法进行直观解释。在五个数据集上的大量实验证实了我们方案的有效性。我们的VLSA通过为弱监督MIL提供从千兆像素WSI中学习有价值预后线索的有效手段，可能为CPATH中的SA开辟新途径。源代码发布于 https://github.com/liupei101/VLSA。