Spatial transcriptomics (ST) technologies enable transcriptome-wide gene expression profiling while preserving spatial resolution, offering unprecedented opportunities to uncover complex spatial structures. Due to the ultra-high dimensionality of ST data, identifying spatially variable genes (SVGs) associated with unknown spatial clusters has become a central task in ST data analysis. Here, we develop a distribution-free SVG screening method based on a novel quasi-likelihood ratio statistic, the MM-test, combined with a knockoff procedure to control the false discovery rate (FDR). MM-test leverages auxiliary information, such as spatial distances, about the unknown spatial domains for SVG screening. Notably, in addition to two-dimensional ST datasets, MM-test is well-suited for increasingly common three-dimensional (3D), multi-slice ST datasets. Extensive benchmarking using simulations and 34 real ST datasets demonstrates that MM-test consistently outperforms existing SVG detection methods. In a 3D mouse brain dataset, MM-test accurately delineates fine-scale structures that are challenging for other methods, such as the 3D architecture of the pyramidal layer of the hippocampal cornu ammonis and the dentate gyrus. Theoretical guarantees-including selection consistency, FDR control, and an error bound for post-selection clustering-are also established.
翻译:空间转录组学(ST)技术能够在保持空间分辨率的同时进行全转录组范围的基因表达谱分析,为揭示复杂的空间结构提供了前所未有的机遇。由于ST数据具有超高维度,识别与未知空间聚类相关的空间可变基因(SVGs)已成为ST数据分析的核心任务。本文开发了一种基于新型拟似然比统计量(MM检验)的分布无关SVGs筛选方法,并结合knockoff程序以控制错误发现率(FDR)。MM检验利用关于未知空间域(如空间距离)的辅助信息进行SVGs筛选。值得注意的是,除了二维ST数据集外,MM检验也非常适用于日益常见的三维(3D)多切片ST数据集。通过模拟和34个真实ST数据集进行的广泛基准测试表明,MM检验在性能上持续优于现有的SVGs检测方法。在一个3D小鼠大脑数据集中,MM检验精确地描绘了其他方法难以识别的精细结构,例如海马角锥体层和齿状回的3D架构。本文还建立了理论保证,包括选择一致性、FDR控制以及选择后聚类的误差界。