Principal Component Analysis in Space Forms

Principal Component Analysis (PCA) is a workhorse of modern data science. While PCA assumes the data conforms to Euclidean geometry, for specific data types, such as hierarchical and cyclic data structures, other spaces are more appropriate. We study PCA in space forms; that is, those with constant curvatures. At a point on a Riemannian manifold, we can define a Riemannian affine subspace based on a set of tangent vectors. Finding the optimal low-dimensional affine subspace for given points in a space form amounts to dimensionality reduction. Our Space Form PCA (SFPCA) seeks the affine subspace that best represents a set of manifold-valued points with the minimum projection cost. We propose proper cost functions that enjoy two properties: (1) their optimal affine subspace is the solution to an eigenequation, and (2) optimal affine subspaces of different dimensions form a nested set. These properties provide advances over existing methods, which are mostly iterative algorithms with slow convergence and weaker theoretical guarantees. We evaluate the proposed SFPCA on real and simulated data in spherical and hyperbolic spaces. We show that it outperforms alternative methods in estimating true subspaces (in simulated data) with respect to convergence speed or accuracy, often both.

翻译：主成分分析（PCA）是现代数据科学的基石工具。尽管PCA假定数据符合欧几里得几何，但对于特定数据类型，例如层次结构和循环数据结构，其他空间更为适用。我们研究了空间形式（即具有恒定曲率的空间）中的PCA。在黎曼流形上的某一点，我们可以基于一组切向量定义一个黎曼仿射子空间。在空间形式中为给定点寻找最优的低维仿射子空间等同于降维。我们的空间形式主成分分析（SFPCA）旨在寻找能以最小投影成本最佳表示一组流形值点的仿射子空间。我们提出了具有以下两个特性的适当成本函数：（1）其最优仿射子空间是某个特征方程的解；（2）不同维度的最优仿射子空间构成一个嵌套集。这些特性相较于现有方法提供了进步，现有方法大多是收敛速度慢且理论保证较弱的迭代算法。我们在球面空间和双曲空间的真实与模拟数据上评估了所提出的SFPCA。结果表明，在估计真实子空间（在模拟数据中）方面，SFPCA在收敛速度或精度上，通常在这两方面均优于其他方法。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

Graph Transformer近期进展

专知会员服务

65+阅读 · 2023年1月5日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日