In this paper, we study the estimation of the $k$-dimensional sparse principal subspace of covariance matrix $\Sigma$ in the high-dimensional setting. We aim to recover the oracle principal subspace solution, i.e., the principal subspace estimator obtained assuming the true support is known a priori. To this end, we propose a family of estimators based on the semidefinite relaxation of sparse PCA with novel regularizations. In particular, under a weak assumption on the magnitude of the population projection matrix, one estimator within this family exactly recovers the true support with high probability, has exact rank-$k$, and attains a $\sqrt{s/n}$ statistical rate of convergence with $s$ being the subspace sparsity level and $n$ the sample size. Compared to existing support recovery results for sparse PCA, our approach does not hinge on the spiked covariance model or the limited correlation condition. As a complement to the first estimator that enjoys the oracle property, we prove that, another estimator within the family achieves a sharper statistical rate of convergence than the standard semidefinite relaxation of sparse PCA, even when the previous assumption on the magnitude of the projection matrix is violated. We validate the theoretical results by numerical experiments on synthetic datasets.
翻译:本文研究高维协方差矩阵Σ的k维稀疏主子空间的估计问题。我们旨在恢复Oracle主子空间解,即假定真实支持度已知的情况下得到的主子空间估计量。为此,我们提出基于稀疏PCA半定松弛并引入新型正则化项的一族估计量。特别地,在群体投影矩阵幅度的弱假设下,该族中某个估计量能以高概率精确恢复真实支持度,具有精确的秩k,并达到√(s/n)的统计收敛速率,其中s为子空间稀疏度,n为样本量。与现有稀疏PCA的支持度恢复结果相比,我们的方法不依赖于尖峰协方差模型或有限相关条件。作为第一个具有Oracle性质的估计量的补充,我们证明即使在投影矩阵幅度的前述假设被违反时,该族中另一个估计量仍能实现比标准稀疏PCA半定松弛更优的统计收敛速率。通过合成数据集上的数值实验验证了理论结果。