Principal component analysis (PCA) is a widely used dimension reduction method, but its performance is known to be non-robust to outliers. Recently, product-PCA (PPCA) has been shown to possess the efficiency-loss free ordering-robustness property: (i) in the absence of outliers, PPCA and PCA share the same asymptotic distributions; (ii), in the presence of outliers, PPCA is more ordering-robust than PCA in estimating the leading eigenspace. PPCA is thus different from the conventional robust PCA methods, and may deserve further investigations. In this article, we study the high-dimensional statistical properties of the PPCA eigenvalues via the techniques of random matrix theory. In particular, we derive the critical value for being distant spiked eigenvalues, the limiting values of the sample spiked eigenvalues, and the limiting spectral distribution of PPCA. Similar to the case of PCA, the explicit forms of the asymptotic properties of PPCA become available under the special case of the simple spiked model. These results enable us to more clearly understand the superiorities of PPCA in comparison with PCA. Numerical studies are conducted to verify our results.
翻译:主成分分析(PCA)是一种广泛使用的降维方法,但其性能已知对异常值不具有鲁棒性。最近研究表明,乘积主成分分析(PPCA)具有无效率损失的排序鲁棒性:(i)在无异常值情况下,PPCA与PCA具有相同的渐近分布;(ii)存在异常值时,PPCA在估计主特征子空间时比PCA具有更强的排序鲁棒性。因此PPCA不同于传统的鲁棒PCA方法,值得进一步研究。本文通过随机矩阵理论技术研究了PPCA特征值的高维统计性质。具体而言,我们推导了远距离尖峰特征值的临界值、样本尖峰特征值的极限值以及PPCA的极限谱分布。与PCA情况类似,在简单尖峰模型这一特殊情形下,PPCA渐近性质的显式表达式得以建立。这些结果使我们能更清晰地理解PPCA相较于PCA的优越性。数值研究验证了我们的理论结果。