Spatial transcriptomics measures the expression of thousands of genes in a tissue sample while preserving its spatial structure. This class of technologies has enabled the investigation of the spatial variation of gene expressions and their impact on specific biological processes. Identifying genes with similar expression profiles is of utmost importance, thus motivating the development of flexible methods leveraging spatial data structure to cluster genes. Here, we propose a modeling framework for clustering observations measured over numerous spatial locations via Gaussian processes. Rather than specifying their covariance kernels as a function of the spatial structure, we use it to inform a generalized Cholesky decomposition of their precision matrices. This approach prevents issues with kernel misspecification and facilitates the estimation of a non-stationarity spatial covariance structure. Applied to spatial transcriptomic data, our model identifies gene clusters with distinctive spatial correlation patterns across tissue areas comprising different cell types, like tumoral and stromal areas.
翻译:空间转录组学技术能够在保持组织样本空间结构的同时,测量数千个基因的表达水平。此类技术使得研究基因表达的空间变异及其对特定生物学过程的影响成为可能。识别具有相似表达谱的基因至关重要,这推动了利用空间数据结构进行基因聚类的灵活方法的发展。本文提出了一种通过高斯过程对多个空间位置观测值进行聚类的建模框架。我们并非将其协方差核函数指定为空间结构的函数,而是利用空间结构来指导其精度矩阵的广义Cholesky分解。该方法避免了核函数误设问题,并有助于估计非平稳的空间协方差结构。将我们的模型应用于空间转录组数据,能够识别出在包含不同细胞类型(如肿瘤区域和基质区域)的组织区域中具有独特空间相关模式的基因簇。