Thresholded Oja does Sparse PCA?

We consider the problem of Sparse Principal Component Analysis (PCA) when the ratio $d/n \rightarrow c > 0$. There has been a lot of work on optimal rates on sparse PCA in the offline setting, where all the data is available for multiple passes. In contrast, when the population eigenvector is $s$-sparse, streaming algorithms that have $O(d)$ storage and $O(nd)$ time complexity either typically require strong initialization conditions or have a suboptimal error. We show that a simple algorithm that thresholds and renormalizes the output of Oja's algorithm (the Oja vector) obtains a near-optimal error rate. This is very surprising because, without thresholding, the Oja vector has a large error. Our analysis centers around bounding the entries of the unnormalized Oja vector, which involves the projection of a product of independent random matrices on a random initial vector. This is nontrivial and novel since previous analyses of Oja's algorithm and matrix products have been done when the trace of the population covariance matrix is bounded while in our setting, this quantity can be as large as $n$.

翻译：我们考虑当比率$d/n \rightarrow c > 0$时的稀疏主成分分析问题。在离线场景下（即所有数据可用于多次迭代），已有大量关于稀疏PCA最优速率的研究工作。相比之下，当总体特征向量为$s$稀疏时，具有$O(d)$存储和$O(nd)$时间复杂度的流式算法要么需要强初始化条件，要么存在次优误差。本文证明，对Oja算法输出（Oja向量）进行阈值化与重归一化的简单算法能够获得接近最优的误差率。这一结果令人非常惊讶，因为未经阈值化的Oja向量存在较大误差。我们的分析核心在于界定非归一化Oja向量各分量的上界——这涉及随机初始向量在独立随机矩阵乘积上的投影。该分析具有非平凡性与新颖性，因为此前对Oja算法及矩阵乘积的分析均假设总体协方差矩阵的迹有界，而本文场景中该量可达到$n$量级。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日