Spatially resolved transcriptomics brings exciting breakthroughs to single-cell analysis by providing physical locations along with gene expression. However, as a cost of the extremely high spatial resolution, the cellular level spatial transcriptomic data suffer significantly from missing values. While a standard solution is to perform imputation on the missing values, most existing methods either overlook spatial information or only incorporate localized spatial context without the ability to capture long-range spatial information. Using multi-head self-attention mechanisms and positional encoding, transformer models can readily grasp the relationship between tokens and encode location information. In this paper, by treating single cells as spatial tokens, we study how to leverage transformers to facilitate spatial tanscriptomics imputation. In particular, investigate the following two key questions: (1) $\textit{how to encode spatial information of cells in transformers}$, and (2) $\textit{ how to train a transformer for transcriptomic imputation}$. By answering these two questions, we present a transformer-based imputation framework, SpaFormer, for cellular-level spatial transcriptomic data. Extensive experiments demonstrate that SpaFormer outperforms existing state-of-the-art imputation algorithms on three large-scale datasets while maintaining superior computational efficiency.
翻译:空间解析转录组学通过提供物理位置与基因表达信息,为单细胞分析带来了突破性进展。然而,作为极高空间分辨率的代价,细胞级别的空间转录组数据存在严重的缺失值问题。虽然标准解决方案是对缺失值进行插补,但现有大多数方法要么忽略空间信息,要么仅整合局部空间上下文而无法捕获长程空间信息。利用多头自注意力机制和位置编码,Transformer模型能够轻松捕捉标记之间的关系并编码位置信息。本文通过将单细胞视为空间标记,研究如何利用Transformer促进空间转录组插补,具体探讨以下两个关键问题:(1)如何在Transformer中编码细胞的空间信息,以及(2)如何训练用于转录组插补的Transformer。通过解答这两个问题,我们提出了一种基于Transformer的插补框架SpaFormer,用于细胞级空间转录组数据。大量实验表明,SpaFormer在三个大规模数据集上优于现有最先进的插补算法,同时保持出色的计算效率。