Foundation models (FMs) have emerged as powerful representation extractors for medical data, yet their generalizability to datasets under distribution shift remains underexplored. This work systematically evaluates FM-based representations on a suite of computational pathology tasks across two real-world commercial cohorts, IH-BC and IH-NSCLC, drawn from the licensed in-house (IH) oncology dataset. The analysis focuses on two modalities, whole-slide images and transcriptomic profiles, drawn from the IH multimodal data. We first benchmark unimodal probing performance across five FMs on eight downstream classification tasks, and find that image and omics representations carry complementary predictive signals. Then we investigate whether multimodal fusion can yield additional gains over unimodal baselines by comparing three image-omics fusion strategies built on paired representations. The trustworthiness of selected unimodal and multimodal pipelines is further assessed through conformal prediction. Our results show that FM representations achieve competitive performance on out-of-distribution data and that multimodal fusion helps mainly when no single modality dominates the signal. Conformal prediction reveals that in the majority of cases where a point prediction fails, the true diagnosis remains recoverable within the prediction set, reinforcing the value of uncertainty-aware inference for clinical support.
翻译:基础模型(FMs)已发展为医学数据中强大的表示提取器,但其在分布偏移数据集下的泛化能力仍未被充分探索。本研究基于两个真实世界商业队列(IH-BC和IH-NSCLC,源自授权内部肿瘤学数据集),系统性评估了基于FM的表示在一系列计算病理学任务中的表现。分析聚焦于该内部多模态数据中的两种模态:全切片图像与转录组图谱。我们首先在八项下游分类任务中对五种FM的单模态探测性能进行了基准测试,发现图像与组学表示携带互补的预测信号。随后,通过比较三种基于配对表示的图像-组学融合策略,探究多模态融合能否在单模态基线基础上带来额外收益。进一步利用共形预测评估了所选单模态与多模态流程的可信度。结果表明,FM表示在分布外数据上展现了具有竞争力的性能,且多模态融合主要在没有单一模态主导信号时发挥作用。共形预测揭示,在点预测失败的大多数案例中,真实诊断仍可在预测集合中被恢复,这强化了不确定性感知推理在临床支持中的价值。