ConSurv: Multimodal Continual Learning for Survival Analysis

from arxiv, 14 pages, 4 figures. This is the extended version of the paper accepted at AAAI 2026, which includes all technical appendices and additional experimental details

Survival prediction of cancers is crucial for clinical practice, as it informs mortality risks and influences treatment plans. However, a static model trained on a single dataset fails to adapt to the dynamically evolving clinical environment and continuous data streams, limiting its practical utility. While continual learning (CL) offers a solution to learn dynamically from new datasets, existing CL methods primarily focus on unimodal inputs and suffer from severe catastrophic forgetting in survival prediction. In real-world scenarios, multimodal inputs often provide comprehensive and complementary information, such as whole slide images and genomics; and neglecting inter-modal correlations negatively impacts the performance. To address the two challenges of catastrophic forgetting and complex inter-modal interactions between gigapixel whole slide images and genomics, we propose ConSurv, the first multimodal continual learning (MMCL) method for survival analysis. ConSurv incorporates two key components: Multi-staged Mixture of Experts (MS-MoE) and Feature Constrained Replay (FCR). MS-MoE captures both task-shared and task-specific knowledge at different learning stages of the network, including two modality encoders and the modality fusion component, learning inter-modal relationships. FCR further enhances learned knowledge and mitigates forgetting by restricting feature deviation of previous data at different levels, including encoder-level features of two modalities and the fusion-level representations. Additionally, we introduce a new benchmark integrating four datasets, Multimodal Survival Analysis Incremental Learning (MSAIL), for comprehensive evaluation in the CL setting. Extensive experiments demonstrate that ConSurv outperforms competing methods across multiple metrics.

翻译：癌症生存预测对临床实践至关重要，其可评估死亡风险并影响治疗方案的制定。然而，基于单一数据集训练的静态模型难以适应动态演变的临床环境与持续流入的数据，限制了其实用价值。尽管持续学习（CL）为从新数据集中动态学习提供了解决方案，但现有CL方法主要针对单模态输入，且在生存预测任务中存在严重的灾难性遗忘问题。在实际场景中，多模态输入（如全切片图像与基因组数据）常能提供全面且互补的信息，忽略模态间关联将损害模型性能。为应对灾难性遗忘以及千兆像素级全切片图像与基因组数据间复杂模态交互的双重挑战，本文提出首个面向生存分析的多模态持续学习（MMCL）方法ConSurv。该方法包含两个核心组件：多阶段专家混合模块（MS-MoE）与特征约束回放机制（FCR）。MS-MoE在网络的不同学习阶段（包括两个模态编码器及模态融合组件）同时捕获任务共享知识与任务特定知识，以学习模态间关联。FCR通过约束历史数据在不同层级（包括双模态编码器级特征与融合级表征）的特征偏移，进一步增强已学知识并缓解遗忘。此外，我们构建了集成四个数据集的多模态生存分析增量学习基准（MSAIL），用于持续学习场景下的综合评估。大量实验表明，ConSurv在多项指标上均优于现有方法。