The advent of next-generation sequencing-based spatially resolved transcriptomics (SRT) techniques has reshaped genomic studies by enabling high-throughput gene expression profiling while preserving spatial and morphological context. Nevertheless, there are inherent challenges associated with these new high-dimensional spatial data, such as zero-inflation, over-dispersion, and heterogeneity. These challenges pose obstacles to effective clustering, which is a fundamental problem in SRT data analysis. Current computational approaches often rely on heuristic data preprocessing and arbitrary cluster number prespecification, leading to considerable information loss and consequently, suboptimal downstream analysis. In response to these challenges, we introduce BNPSpace, a novel Bayesian nonparametric spatial clustering framework that directly models SRT count data. BNPSpace facilitates the partitioning of the whole spatial domain, which is characterized by substantial heterogeneity, into homogeneous spatial domains with similar molecular characteristics while identifying a parsimonious set of discriminating genes among different spatial domains. Moreover, BNPSpace incorporates spatial information through a Markov random field prior model, encouraging a smooth and biologically meaningful partition pattern.
翻译:基于下一代测序的空间解析转录组学(SRT)技术的出现,通过在高通量基因表达谱分析中保留空间和形态学背景,重塑了基因组学研究。然而,这些新型高维空间数据存在固有挑战,例如零膨胀、过度离散和异质性。这些挑战对SRT数据分析中的基础问题——有效聚类——构成了障碍。现有计算方法通常依赖启发式数据预处理和任意预设聚类数量,导致显著信息丢失,进而影响下游分析效果。针对这些问题,我们提出了BNPSpace——一种新型贝叶斯非参数空间聚类框架,可直接对SRT计数数据进行建模。BNPSpace能够将具有显著异质性的整个空间区域划分为具有相似分子特征的均质空间域,同时识别不同空间域间一组简约的判别基因。此外,BNPSpace通过马尔可夫随机场先验模型整合空间信息,促进生成平滑且具有生物学意义的分区模式。