Spatial transcriptomics (ST) links gene expression with tissue morphology but remains expensive and low-throughput, motivating surrogates that infer expression from routine histology. Whole-slide H&E-to-ST inference pairs a gigapixel image with gene measurements at a sparse, irregular set of locations, making multiscale modeling challenging without incurring dense-grid overhead or quadratic token mixing. We propose HiST, a hierarchical sparse transformer that treats measured locations as a lattice-indexed sparse field and builds a dyadic encoder--decoder directly on the active tissue footprint. HiST combines sparse window attention for local geometric correspondence with resolution-changing operators for rapid multiscale context integration. For a fixed window size, the dominant runtime and memory scale with the number of observed locations rather than the dense slide area. To mitigate slide-specific acquisition variation, HiST adds a bottlenecked global conditioning pathway via a \emph{slide calibration token} that summarizes slide-level context and conditions local representations. On a multi-organ benchmark spanning diverse tissues and acquisition sources, HiST improves predictive performance over recent baselines while reducing runtime and peak memory.
翻译:空间转录组学(ST)将基因表达与组织形态相关联,但成本高昂且通量较低,从而催生了从常规组织学推断表达水平的替代方法。全切片H&E到ST的推断任务将千兆像素图像与稀疏、不规则位置上的基因测量配对,使得多尺度建模面临挑战,且无法避免密集网格开销或二次方令牌混合。我们提出HiST,一种分层稀疏Transformer,它将测量位置视为网格索引的稀疏场,并直接在活跃组织足迹上构建二元编码器-解码器。HiST结合了用于局部几何对应的稀疏窗口注意力与用于快速多尺度上下文集成的分辨率变换算子。在固定窗口大小下,主要运行时间和内存开销随观测位置数量而非密集切片面积呈线性增长。为缓解切片特异性采集差异,HiST通过一个瓶颈式全局条件通路——即汇总切片级上下文并调节局部表征的《切片校准令牌》——实现增强。在涵盖多种组织与采集来源的多器官基准测试中,HiST在降低运行时间和峰值内存的同时,较近期基线方法提升了预测性能。