In streaming PCA, we see a stream of vectors $x_1, \dotsc, x_n \in \mathbb{R}^d$ and want to estimate the top eigenvector of their covariance matrix. This is easier if the spectral ratio $R = \lambda_1 / \lambda_2$ is large. We ask: how large does $R$ need to be to solve streaming PCA in $\widetilde{O}(d)$ space? Existing algorithms require $R = \widetilde{\Omega}(d)$. We show: (1) For all mergeable summaries, $R = \widetilde{\Omega}(\sqrt{d})$ is necessary. (2) In the insertion-only model, a variant of Oja's algorithm gets $o(1)$ error for $R = O(\log n \log d)$. (3) No algorithm with $o(d^2)$ space gets $o(1)$ error for $R = O(1)$. Our analysis is the first application of Oja's algorithm to adversarial streams. It is also the first algorithm for adversarial streaming PCA that is designed for a spectral, rather than Frobenius, bound on the tail; and the bound it needs is exponentially better than is possible by adapting a Frobenius guarantee.
翻译:在流式PCA中,我们观测到一个向量流 $x_1, \dotsc, x_n \in \mathbb{R}^d$,并希望估计其协方差矩阵的顶部特征向量。若光谱比 $R = \lambda_1 / \lambda_2$ 较大,则问题更易求解。我们探讨:$R$ 需要多大才能在 $\widetilde{O}(d)$ 空间内解决流式PCA问题?现有算法要求 $R = \widetilde{\Omega}(d)$。我们证明:(1) 对于所有可合并摘要,$R = \widetilde{\Omega}(\sqrt{d})$ 是必要条件。(2) 在仅插入模型中,Oja算法的一个变体在 $R = O(\log n \log d)$ 时能达到 $o(1)$ 误差。(3) 任何使用 $o(d^2)$ 空间的算法在 $R = O(1)$ 时均无法达到 $o(1)$ 误差。我们的分析是首次将Oja算法应用于对抗性流。这也是首个为对抗性流式PCA设计的算法,其针对尾部误差采用的是光谱界而非Frobenius界;并且该算法所需的光谱界比通过适配Frobenius保证可能达到的界指数级更优。