Stratified Principal Component Analysis

This paper investigates a general family of models that stratifies the space of covariance matrices by eigenvalue multiplicity. This family, coined Stratified Principal Component Analysis (SPCA), includes in particular Probabilistic PCA (PPCA) models, where the noise component is assumed to be isotropic. We provide an explicit maximum likelihood and a geometric characterization relying on flag manifolds. A key outcome of this analysis is that PPCA's parsimony (with respect to the full covariance model) is due to the eigenvalue-equality constraint in the noise space and the subsequent inference of a multidimensional eigenspace. The sequential nature of flag manifolds enables to extend this constraint to the signal space and bring more parsimonious models. Moreover, the stratification and the induced partial order on SPCA yield efficient model selection heuristics. Experiments on simulated and real datasets substantiate the interest of equalising adjacent sample eigenvalues when the gaps are small and the number of samples is limited. They notably demonstrate that SPCA models achieve a better complexity/goodness-of-fit tradeoff than PPCA.

翻译：本文研究了一类通过对特征值重数进行分层来处理协方差矩阵空间的一般模型族。该模型族被称为分层主成分分析（SPCA），特别包含了概率主成分分析（PPCA）模型，其中假设噪声分量是各向同性的。我们给出了显式的最大似然估计，并提出了基于旗流形的几何刻画。该分析的关键结论是：PPCA相对于全协方差模型的简约性源于噪声空间中的特征值相等约束以及后续对多维特征空间的推断。旗流形的序贯性使得该约束可扩展至信号空间，从而构建更简约的模型。此外，SPCA的分层结构及其诱导的偏序关系为高效模型选择提供了启发式方法。在模拟和真实数据集上的实验证实，当特征值间隙较小且样本量有限时，均衡相邻样本特征值具有实际价值。实验特别表明，SPCA模型相比PPCA能实现更优的复杂度与拟合优度均衡。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日