Towards Spatial Transcriptomics-driven Pathology Foundation Models

Konstantin Hemker,Andrew H. Song,Cristina Almagro-Pérez,Guillaume Jaume,Sophia J. Wagner,Anurag Vaidya,Nikola Simidjievski,Mateja Jamnik,Faisal Mahmood

Spatial transcriptomics (ST) provides spatially resolved measurements of gene expression, enabling characterization of the molecular landscape of human tissue beyond histological assessment as well as localized readouts that can be aligned with morphology. Concurrently, the success of multimodal foundation models that integrate vision with complementary modalities suggests that morphomolecular coupling between local expression and morphology can be systematically used to improve histological representations themselves. We introduce Spatial Expression-Aligned Learning (SEAL), a vision-omics self-supervised learning framework that infuses localized molecular information into pathology vision encoders. Rather than training new encoders from scratch, SEAL is designed as a parameter-efficient vision-omics finetuning method that can be flexibly applied to widely used pathology foundation models. We instantiate SEAL by training on over 700,000 paired gene expression spot-tissue region examples spanning tumor and normal samples from 14 organs. Tested across 38 slide-level and 15 patch-level downstream tasks, SEAL provides a drop-in replacement for pathology foundation models that consistently improves performance over widely used vision-only and ST prediction baselines on slide-level molecular status, pathway activity, and treatment response prediction, as well as patch-level gene expression prediction tasks. Additionally, SEAL encoders exhibit robust domain generalization on out-of-distribution evaluations and enable new cross-modal capabilities such as gene-to-image retrieval. Our work proposes a general framework for ST-guided finetuning of pathology foundation models, showing that augmenting existing models with localized molecular supervision is an effective and practical step for improving visual representations and expanding their cross-modal utility.

翻译：空间转录组学（ST）提供了基因表达的空间解析测量，使得我们能够超越组织学评估来表征人体组织的分子景观，并获得可与形态学对齐的局部读数。与此同时，整合视觉与互补模态的多模态基础模型取得的成功表明，局部表达与形态之间的形态分子耦合可被系统性地用于改进组织学表征本身。我们提出了空间表达对齐学习（SEAL），这是一个视觉-组学自监督学习框架，它将局部分子信息注入病理学视觉编码器。SEAL并非从头训练新的编码器，而是设计为一种参数高效的视觉-组学微调方法，可灵活应用于广泛使用的病理学基础模型。我们通过在涵盖14个器官的肿瘤和正常样本中超过70万个配对的基因表达点-组织区域示例上进行训练，实现了SEAL。在38个玻片级和15个区块级下游任务上进行测试，SEAL可作为病理学基础模型的即插即用替代方案，在玻片级分子状态、通路活性和治疗反应预测，以及区块级基因表达预测任务上，其性能持续优于广泛使用的纯视觉和ST预测基线。此外，SEAL编码器在分布外评估中表现出稳健的领域泛化能力，并实现了新的跨模态能力，例如基因到图像的检索。我们的工作提出了一个用于ST引导的病理学基础模型微调的通用框架，表明用局部分子监督增强现有模型是改进视觉表征并扩展其跨模态实用性的有效且实用的步骤。