Sparse PCA Beyond Covariance Thresholding

In the Wishart model for sparse PCA we are given $n$ samples $Y_1,\ldots, Y_n$ drawn independently from a $d$-dimensional Gaussian distribution $N({0, Id + \beta vv^\top})$, where $\beta > 0$ and $v\in \mathbb{R}^d$ is a $k$-sparse unit vector, and we wish to recover $v$ (up to sign). We show that if $n \ge \Omega(d)$, then for every $t \ll k$ there exists an algorithm running in time $n\cdot d^{O(t)}$ that solves this problem as long as \[ \beta \gtrsim \frac{k}{\sqrt{nt}}\sqrt{\ln({2 + td/k^2})}\,. \] Prior to this work, the best polynomial time algorithm in the regime $k\approx \sqrt{d}$, called \emph{Covariance Thresholding} (proposed in [KNV15a] and analyzed in [DM14]), required $\beta \gtrsim \frac{k}{\sqrt{n}}\sqrt{\ln({2 + d/k^2})}$. For large enough constant $t$ our algorithm runs in polynomial time and has better guarantees than Covariance Thresholding. Previously known algorithms with such guarantees required quasi-polynomial time $d^{O(\log d)}$. In addition, we show that our techniques work with sparse PCA with adversarial perturbations studied in [dKNS20]. This model generalizes not only sparse PCA, but also other problems studied in prior works, including the sparse planted vector problem. As a consequence, we provide polynomial time algorithms for the sparse planted vector problem that have better guarantees than the state of the art in some regimes. Our approach also works with the Wigner model for sparse PCA. Moreover, we show that it is possible to combine our techniques with recent results on sparse PCA with symmetric heavy-tailed noise [dNNS22]. In particular, in the regime $k \approx \sqrt{d}$ we get the first polynomial time algorithm that works with symmetric heavy-tailed noise, while the algorithm from [dNNS22]. requires quasi-polynomial time in these settings.

翻译：在稀疏主成分分析（sparse PCA）的Wishart模型中，我们给定$n$个样本$Y_1,\ldots, Y_n$，这些样本独立取自$d$维高斯分布$N({0, Id + \beta vv^\top})$，其中$\beta > 0$且$v\in \mathbb{R}^d$是$k$-稀疏单位向量，我们的目标是恢复$v$（可至符号差）。我们证明：若$n \ge \Omega(d)$，则对每个$t \ll k$，存在一个运行时间为$n\cdot d^{O(t)}$的算法，能在以下条件下解决该问题：\[ \beta \gtrsim \frac{k}{\sqrt{nt}}\sqrt{\ln({2 + td/k^2})}\,. \]在此之前，在$k\approx \sqrt{d}$情形中，最好的多项式时间算法——称为“协方差阈值化”（Covariance Thresholding，由[KNV15a]提出，[DM14]分析）——要求$\beta \gtrsim \frac{k}{\sqrt{n}}\sqrt{\ln({2 + d/k^2})}$。当常数$t$足够大时，我们的算法运行时间为多项式时间，且其保证优于协方差阈值化。此前具有类似保证的算法需要拟多项式时间$d^{O(\log d)}$。此外，我们证明该方法适用于[dKNS20]中研究的带对抗扰动的稀疏PCA。该模型不仅推广了稀疏PCA，还推广了先前工作研究的其他问题，包括稀疏植入向量问题（sparse planted vector problem）。因此，我们为稀疏植入向量问题提供了多项式时间算法，在某些情形下其保证优于现有最优方法。我们的方法同样适用于稀疏PCA的Wigner模型。进一步，我们证明可将其与最近关于对称重尾噪声下稀疏PCA的研究结果[dNNS22]相结合。特别地，在$k \approx \sqrt{d}$情形中，我们首次获得了能在对称重尾噪声下运行的多项式时间算法，而[dNNS22]中的算法在此设置下需要拟多项式时间。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

INRIA 最新《机器学习理论》课程笔记，176页pdf

专知会员服务

52+阅读 · 2020年12月14日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

55+阅读 · 2020年9月7日