Spatial transcriptomics (ST) provides essential spatial context by mapping gene expression within tissue, enabling detailed study of cellular heterogeneity and tissue organization. However, aligning ST data with histology images poses challenges due to inherent spatial distortions and modality-specific variations. Existing methods largely rely on direct alignment, which often fails to capture complex cross-modal relationships. To address these limitations, we propose a novel framework that aligns gene and image features using a ranking-based alignment loss, preserving relative similarity across modalities and enabling robust multi-scale alignment. To further enhance the alignment's stability, we employ self-supervised knowledge distillation with a teacher-student network architecture, effectively mitigating disruptions from high dimensionality, sparsity, and noise in gene expression data. Extensive experiments on gene expression prediction and survival analysis demonstrate our framework's effectiveness, showing improved alignment and predictive performance over existing methods and establishing a robust tool for gene-guided image representation learning in digital pathology.
翻译:空间转录组学(ST)通过绘制组织内的基因表达图谱,提供了关键的空间背景,从而能够详细研究细胞异质性和组织结构。然而,由于固有的空间扭曲和模态特异性差异,将ST数据与组织学图像对齐存在挑战。现有方法主要依赖于直接对齐,这通常无法捕捉复杂的跨模态关系。为了应对这些局限性,我们提出了一种新颖的框架,该框架使用基于排序的对齐损失来对齐基因和图像特征,从而保留跨模态的相对相似性,并实现鲁棒的多尺度对齐。为了进一步增强对齐的稳定性,我们采用具有师生网络架构的自监督知识蒸馏,有效缓解了基因表达数据中高维性、稀疏性和噪声带来的干扰。在基因表达预测和生存分析上的大量实验证明了我们框架的有效性,显示出相较于现有方法改进的对齐和预测性能,并为数字病理学中的基因引导图像表征学习建立了一个鲁棒的工具。