Sparse principal component analysis (SPCA) is widely used for dimensionality reduction and feature extraction in high-dimensional data analysis. Despite many methodological and theoretical developments in the past two decades, the theoretical guarantees of the popular SPCA algorithm proposed by Zou, Hastie & Tibshirani (2006) are still unknown. This paper aims to address this critical gap. We first revisit the SPCA algorithm of Zou et al. (2006) and present our implementation. We also study a computationally more efficient variant of the SPCA algorithm in Zou et al. (2006) that can be considered as the limiting case of SPCA. We provide the guarantees of convergence to a stationary point for both algorithms and prove that, under a sparse spiked covariance model, both algorithms can recover the principal subspace consistently under mild regularity conditions. We show that their estimation error bounds match the best available bounds of existing works or the minimax rates up to some logarithmic factors. Moreover, we demonstrate the competitive numerical performance of both algorithms in numerical studies.
翻译:稀疏主成分分析(SPCA)在高维数据分析中被广泛用于降维和特征提取。尽管过去二十年中已有众多方法论和理论上的发展,但由Zou、Hastie与Tibshirani(2006)提出的流行SPCA算法的理论保证仍然未知。本文旨在解决这一关键空白。我们首先重新审视了Zou等人(2006)的SPCA算法,并展示了我们的实现。我们还研究了Zou等人(2006)中SPCA算法的一种计算效率更高的变体,该变体可视为SPCA的极限情况。我们为这两种算法提供了收敛到稳定点的保证,并证明在稀疏尖峰协方差模型下,两种算法在温和正则条件下能够一致地恢复主成分子空间。我们展示了它们的估计误差界与现有工作的最佳边界或极小化极大速率(至多相差对数因子)相匹配。此外,我们在数值研究中展示了两种算法在数值性能上的竞争力。