This paper investigates a general family of models that stratifies the space of covariance matrices by eigenvalue multiplicity. This family, coined Stratified Principal Component Analysis (SPCA), includes in particular Probabilistic PCA (PPCA) models, where the noise component is assumed to be isotropic. We provide an explicit maximum likelihood and a geometric characterization relying on flag manifolds. A key outcome of this analysis is that PPCA's parsimony (with respect to the full covariance model) is due to the eigenvalue-equality constraint in the noise space and the subsequent inference of a multidimensional eigenspace. The sequential nature of flag manifolds enables to extend this constraint to the signal space and bring more parsimonious models. Moreover, the stratification and the induced partial order on SPCA yield efficient model selection heuristics. Experiments on simulated and real datasets substantiate the interest of equalising adjacent sample eigenvalues when the gaps are small and the number of samples is limited. They notably demonstrate that SPCA models achieve a better complexity/goodness-of-fit tradeoff than PPCA.
翻译:本文研究了一类通过对特征值重数进行分层来处理协方差矩阵空间的一般模型族。该模型族被称为分层主成分分析(SPCA),特别包含了概率主成分分析(PPCA)模型,其中假设噪声分量是各向同性的。我们给出了显式的最大似然估计,并提出了基于旗流形的几何刻画。该分析的关键结论是:PPCA相对于全协方差模型的简约性源于噪声空间中的特征值相等约束以及后续对多维特征空间的推断。旗流形的序贯性使得该约束可扩展至信号空间,从而构建更简约的模型。此外,SPCA的分层结构及其诱导的偏序关系为高效模型选择提供了启发式方法。在模拟和真实数据集上的实验证实,当特征值间隙较小且样本量有限时,均衡相邻样本特征值具有实际价值。实验特别表明,SPCA模型相比PPCA能实现更优的复杂度与拟合优度均衡。