The spatial transcriptomics (ST) data produced by recent biotechnologies, such as CosMx and Xenium, contain huge amount of information about cancer tissue samples, which has great potential for cancer research via detection of community: a collection of cells with distinct cell-type composition and similar neighboring patterns. But existing clustering methods do not work well for community detection of CosMx ST data, and the commonly used kNN compositional data method shows lack of informative neighboring cell patterns for huge CosMx data. In this article, we propose a novel and more informative disk compositional data (DCD) method, which identifies neighboring patterns of each cell via taking into account of ST data features from recent new technologies. After initial processing ST data into DCD matrix, a new innovative and interpretable DCD-TMHC community detection method is proposed here. Extensive simulation studies and CosMx breast cancer data analysis clearly show that our proposed DCD-TMHC method is superior to other methods. Based on the communities detected by DCD-TMHC method for CosMx breast cancer data, the logistic regression analysis results demonstrate that DCD-TMHC method is clearly interpretable and superior, especially in terms of assessment for different stages of cancer. These suggest that our proposed novel, innovative, informative and interpretable DCD-TMHC method here will be helpful and have impact to future cancer research based on ST data, which can improve cancer diagnosis and monitor cancer treatment progress.
翻译:近期生物技术(如CosMx和Xenium)产生的空间转录组学数据蕴含了癌症组织样本的巨量信息,通过检测社区(即具有独特细胞类型组成及相似邻近模式的细胞集合)在癌症研究中展现出巨大潜力。然而,现有聚类方法对CosMx空间转录组学数据的社区检测效果不佳,且常用的kNN成分数据方法在处理大规模CosMx数据时缺乏信息丰富的邻近细胞模式。本文提出了一种新颖且信息量更大的圆盘成分数据方法,该方法通过整合近期新技术产生的空间转录组学数据特征来识别每个细胞的邻近模式。在将空间转录组学数据初步处理为圆盘成分数据矩阵后,本文进一步提出了一种创新且可解释的圆盘成分数据-拓扑模块化层次聚类社区检测方法。大量模拟研究及CosMx乳腺癌数据分析表明,我们提出的圆盘成分数据-拓扑模块化层次聚类方法显著优于其他方法。基于该方法对CosMx乳腺癌数据检测出的社区,逻辑回归分析结果证明圆盘成分数据-拓扑模块化层次聚类方法具有明确的可解释性与优越性,尤其在癌症不同阶段的评估方面表现突出。这表明本文提出的新颖、创新、信息丰富且可解释的圆盘成分数据-拓扑模块化层次聚类方法,将有助于推动基于空间转录组学数据的未来癌症研究,并有望改善癌症诊断与治疗进程监测。