This work studies estimation of sparse principal components in high dimensions. Specifically, we consider a class of estimators based on kernel PCA, generalizing the covariance thresholding algorithm proposed by Krauthgamer et al. (2015). Focusing on Johnstone's spiked covariance model, we investigate the "critical" sparsity regime, where the sparsity level $m$, sample size $n$, and dimension $p$ each diverge and $m/\sqrt{n} \rightarrow \beta$, $p/n \rightarrow \gamma$. Within this framework, we develop a fine-grained understanding of signal detection and recovery. Our results establish a detectability phase transition, analogous to the Baik--Ben Arous--P\'ech\'e (BBP) transition: above a certain threshold -- depending on the kernel function, $\gamma$, and $\beta$ -- kernel PCA is informative. Conversely, below the threshold, kernel principal components are asymptotically orthogonal to the signal. Notably, above this detection threshold, we find that consistent support recovery is possible with high probability. Sparsity plays a key role in our analysis, and results in more nuanced phenomena than in related studies of kernel PCA with delocalized (dense) components. Finally, we identify optimal kernel functions for detection -- and consequently, support recovery -- and numerical calculations suggest that soft thresholding is nearly optimal.
翻译:本研究探讨高维稀疏主成分的估计问题。具体而言,我们考虑一类基于核主成分分析的估计器,推广了Krauthgamer等人(2015)提出的协方差阈值算法。聚焦于Johnstone尖峰协方差模型,我们研究"临界"稀疏度机制,其中稀疏度$m$、样本量$n$和维度$p$均发散且满足$m/\sqrt{n} \rightarrow \beta$,$p/n \rightarrow \gamma$。在此框架下,我们建立了对信号检测与恢复的精细理解。我们的结果揭示了一个与Baik--Ben Arous--P\'ech\'e(BBP)相变类似的检测相变:当超过特定阈值(取决于核函数、$\gamma$和$\beta$)时,核主成分分析具有信息性;反之,在该阈值以下时,核主成分与信号渐近正交。值得注意的是,在检测阈值之上,我们发现以高概率实现一致性支撑恢复是可能的。稀疏性在我们的分析中起着关键作用,并产生了比核主成分分析中非局部化(稠密)成分相关研究更微妙的现象。最后,我们确定了用于检测(进而用于支撑恢复)的最优核函数,数值计算表明软阈值处理接近最优。