Modern datasets often exhibit high dimensionality, yet the data reside in low-dimensional manifolds that can reveal underlying geometric structures critical for data analysis. A prime example of such a dataset is a collection of cell cycle measurements, where the inherently cyclical nature of the process can be represented as a circle or sphere. Motivated by the need to analyze these types of datasets, we propose a nonlinear dimension reduction method, Spherical Rotation Component Analysis (SRCA), that incorporates geometric information to better approximate low-dimensional manifolds. SRCA is a versatile method designed to work in both high-dimensional and small sample size settings. By employing spheres or ellipsoids, SRCA provides a low-rank spherical representation of the data with general theoretic guarantees, effectively retaining the geometric structure of the dataset during dimensionality reduction. A comprehensive simulation study, along with a successful application to human cell cycle data, further highlights the advantages of SRCA compared to state-of-the-art alternatives, demonstrating its superior performance in approximating the manifold while preserving inherent geometric structures.
翻译:现代数据集通常呈现高维特性,但其数据分布于低维流形中,这些流形能揭示对数据分析至关重要的潜在几何结构。此类数据集的典型范例是细胞周期测量数据集合,其内在的周期循环特性可被表征为圆或球面。受分析这类数据需求的驱动,我们提出了一种非线性降维方法——球面旋转成分分析(SRCA),该方法通过融入几何信息来更精确地逼近低维流形。SRCA是一种适用于高维和小样本量场景的通用方法。通过采用球面或椭球体,SRCA能在提供具有一般理论保证的数据低秩球面表示的同时,在降维过程中有效保留数据集的几何结构。综合模拟研究与对人细胞周期数据的成功应用进一步凸显了SRCA相较于当前最先进方法的优势,证明了其在保留固有几何结构的同时逼近流形的卓越性能。