We propose a novel method of finding principal components in multivariate data sets that lie on an embedded nonlinear Riemannian manifold within a higher-dimensional space. Our aim is to extend the geometric interpretation of PCA, while being able to capture non-geodesic modes of variation in the data. We introduce the concept of a principal sub-manifold, a manifold passing through a reference point, and at any point on the manifold extending in the direction of highest variation in the space spanned by the eigenvectors of the local tangent space PCA. Compared to recent work for the case where the sub-manifold is of dimension one Panaretos et al. (2014)--essentially a curve lying on the manifold attempting to capture one-dimensional variation--the current setting is much more general. The principal sub-manifold is therefore an extension of the principal flow, accommodating to capture higher dimensional variation in the data. We show the principal sub-manifold yields the ball spanned by the usual principal components in Euclidean space. By means of examples, we illustrate how to find, use and interpret a principal sub-manifold and we present an application in shape analysis.
翻译:我们提出一种新颖方法,用于寻找位于高维空间中嵌入非线性黎曼流形上的多元数据集的主成分。其目标是在扩展主成分分析的几何解释的同时,能够捕捉数据中非测地线的变异模式。我们引入了"主次子流形"的概念——该流形穿过一个参考点,并且在其上任意一点处,沿着由局部切空间PCA特征向量张成的空间中变异最大的方向延伸。相较于近期针对一维子流形(本质上为试图捕捉一维变异的流形上的曲线,Panaretos等,2014)的研究,本文所设情景更具一般性。因此,主次子流形是主流的延伸,可适应于捕捉数据中更高维的变异。理论表明,主次子流形在欧氏空间中可生成由常规主成分张成的球体。通过实例,我们阐释了如何发现、使用并解释主次子流形,并展示了其在形状分析中的应用。