Single-cell RNA sequencing (scRNA-seq) is a relatively new technology that has stimulated enormous interest in statistics, data science, and computational biology due to the high dimensionality, complexity, and large scale associated with scRNA-seq data. Nonnegative matrix factorization (NMF) offers a unique approach due to its meta-gene interpretation of resulting low-dimensional components. However, NMF approaches suffer from the lack of multiscale analysis. This work introduces two persistent Laplacian regularized NMF methods, namely, topological NMF (TNMF) and robust topological NMF (rTNMF). By employing a total of 12 datasets, we demonstrate that the proposed TNMF and rTNMF significantly outperform all other NMF-based methods. We have also utilized TNMF and rTNMF for the visualization of popular Uniform Manifold Approximation and Projection (UMAP) and t-distributed stochastic neighbor embedding (t-SNE).
翻译:单细胞RNA测序(scRNA-seq)是一项相对新兴的技术,其数据具有高维度、复杂性和大规模的特点,在统计学、数据科学和计算生物学领域引发了极大关注。非负矩阵分解(NMF)因其对所得低维分量具有元基因解释能力而提供了一种独特的方法。然而,NMF方法缺乏多尺度分析能力。本研究引入了两种持久拉普拉斯正则化NMF方法,即拓扑NMF(TNMF)和鲁棒拓扑NMF(rTNMF)。通过使用总计12个数据集,我们证明所提出的TNMF和rTNMF方法显著优于所有其他基于NMF的方法。我们还利用TNMF和rTNMF对流行的均匀流形逼近与投影(UMAP)和t分布随机邻域嵌入(t-SNE)进行了可视化应用。