This article studies the robustness of the eigenvalue ordering, an important issue when estimating the leading eigen-subspace by principal component analysis (PCA). In Yata and Aoshima (2010), cross-data-matrix PCA (CDM-PCA) was proposed and shown to have smaller bias than PCA in estimating eigenvalues. While CDM-PCA has the potential to achieve better estimation of the leading eigen-subspace than the usual PCA, its robustness is not well recognized. In this article, we first develop a more stable variant of CDM-PCA, which we call product-PCA (PPCA), that provides a more convenient formulation for theoretical investigation. Secondly, we prove that, in the presence of outliers, PPCA is more robust than PCA in maintaining the correct ordering of leading eigenvalues. The robustness gain in PPCA comes from the random data partition, and it does not rely on a data down-weighting scheme as most robust statistical methods do. This enables us to establish the surprising finding that, when there are no outliers, PPCA and PCA share the same asymptotic distribution. That is, the robustness gain of PPCA in estimating the leading eigen-subspace has no efficiency loss in comparison with PCA. Simulation studies and a face data example are presented to show the merits of PPCA. In conclusion, PPCA has a good potential to replace the role of the usual PCA in real applications whether outliers are present or not.
翻译:本文研究特征值排序的鲁棒性问题,这是通过主成分分析(PCA)估计主导特征子空间时的重要议题。Yata与Aoshima(2010)提出了交叉数据矩阵PCA(CDM-PCA),并证明其在特征值估计中比传统PCA具有更小的偏差。尽管CDM-PCA在主导特征子空间估计方面具有优于传统PCA的潜力,但其鲁棒性尚未得到充分认识。本文首先开发了一种更稳定的CDM-PCA变体,称为乘积-PCA(PPCA),为理论研究提供了更便捷的公式化表达。其次,我们证明在存在异常值时,PPCA在维持主导特征值正确排序方面比PCA更具鲁棒性。PPCA的鲁棒性增益源于随机数据分割,而非像大多数鲁棒统计方法那样依赖数据降权机制。这一特性使我们得出惊人结论:当不存在异常值时,PPCA与PCA具有相同的渐近分布。也就是说,PPCA在估计主导特征子空间时的鲁棒性增益相较于PCA不存在效率损失。通过模拟实验和面部数据实例展示了PPCA的优势。结论表明,无论数据中是否存在异常值,PPCA在实际应用中都具备替代传统PCA角色的良好潜力。