Identifying genes that display spatial patterns is critical to investigating expression interactions within a spatial context and further dissecting biological understanding of complex mechanistic functionality. Despite the increase in statistical methods designed to identify spatially variable genes, they are mostly based on marginal analysis and share the limitation that the dependence (network) structures among genes are not well accommodated, where a biological process usually involves changes in multiple genes that interact in a complex network. Moreover, the latent cellular composition within spots may introduce confounding variations, negatively affecting identification accuracy. In this study, we develop a novel Bayesian regularization approach for spatial transcriptomic data, with the confounding variations induced by varying cellular distributions effectively corrected. Significantly advancing from the existing studies, a thresholded graph Laplacian regularization is proposed to simultaneously identify spatially variable genes and accommodate the network structure among genes. The proposed method is based on a zero-inflated negative binomial distribution, effectively accommodating the count nature, zero inflation, and overdispersion of spatial transcriptomic data. Extensive simulations and the application to real data demonstrate the competitive performance of the proposed method.
翻译:识别具有空间表达模式的基因对于探究空间背景下的表达互作机制、进而解析复杂生物功能机理至关重要。尽管用于识别空间可变基因的统计方法日益增多,但现有方法多基于边际分析,且普遍存在未能充分纳入基因间依赖(网络)结构的局限——而生物过程通常涉及多个在复杂网络中相互作用的基因共同变化。此外,空间位点内潜在的细胞组成差异可能引入混杂变异,对识别准确性产生负面影响。本研究针对空间转录组数据提出了一种新型贝叶斯正则化方法,可有效校正由细胞分布差异引起的混杂变异。相较于现有研究的重要进展在于,本方法通过引入阈值图拉普拉斯正则项,在识别空间可变基因的同时兼顾基因间的网络结构。所提方法基于零膨胀负二项分布,能有效适应空间转录组数据的计数特性、零膨胀现象和过度离散特征。大量模拟实验和真实数据分析均证明了该方法的优越性能。