Outliers contaminating data sets are a challenge to statistical estimators. Even a small fraction of outlying observations can heavily influence most classical statistical methods. In this paper we propose generalized spherical principal component analysis, a new robust version of principal component analysis that is based on the generalized spatial sign covariance matrix. Supporting theoretical properties of the proposed method including influence functions, breakdown values and asymptotic efficiencies are studied, and a simulation study is conducted to compare our new method to existing methods. We also propose an adjustment of the generalized spatial sign covariance matrix to achieve better Fisher consistency properties. We illustrate that generalized spherical principal component analysis, depending on a chosen radial function, has both great robustness and efficiency properties in addition to a low computational cost.
翻译:离群值对数据集的污染是统计估计面临的一个挑战。即使少量异常观测值也可能严重影响大多数经典统计方法。本文提出广义球面主成分分析——一种基于广义空间符号协方差矩阵的新型稳健主成分分析方法。我们研究了所提方法的支撑理论性质,包括影响函数、崩溃值和渐近效率,并通过模拟研究将新方法与现有方法进行比较。我们还对广义空间符号协方差矩阵进行了调整,以实现更好的费舍尔一致性性质。研究结果表明,广义球面主成分分析根据所选径向函数的不同,不仅具有极佳的稳健性和有效性,而且计算成本较低。