Robust covariance estimation for distributed principal component analysis

Fan et al. [$\mathit{Annals}$ $\mathit{of}$ $\mathit{Statistics}$ $\textbf{47}$(6) (2019) 3009-3031] constructed a distributed principal component analysis (PCA) algorithm to reduce the communication cost between multiple servers significantly. However, their algorithm's guarantee is only for sub-Gaussian data. Spurred by this deficiency, this paper enhances the effectiveness of their distributed PCA algorithm by utilizing robust covariance matrix estimators of Minsker [$\mathit{Annals}$ $\mathit{of}$ $\mathit{Statistics}$ $\textbf{46}$(6A) (2018) 2871-2903] and Ke et al. [$\mathit{Statistical}$ $\mathit{Science}$ $\textbf{34}$(3) (2019) 454-471] to tame heavy-tailed data. The theoretical results demonstrate that when the sampling distribution is symmetric innovation with the bounded fourth moment or asymmetric with the finite $6$-th moment, the statistical error rate of the final estimator produced by the robust algorithm is similar to that of sub-Gaussian tails. Extensive numerical trials support the theoretical analysis and indicate that our algorithm is robust to heavy-tailed data and outliers.

翻译：Fan et al. [$\\mathit{Annals}$\mathit{统计学}$$(mathit{统计学}$$\mathit{统计学}$$\\textbf{47}$(6) (2019) 3009-3031]] 建构了一个分散主要组成部分分析(PCA)算法,以大幅降低多个服务器之间的通信成本。然而,它们的算法保证仅用于亚伽西数据。由于这一缺陷,本文件利用明斯克[$\mathit{统计学}美元[$\mathit{统计学] 的坚固的基数矩阵估测算器[$\mathatit{统计学} 和Ke etal al. [$\mathitatitt{统计学}$\ textbf*(3) (2019) 454-471] 和 tame 重整数据,提高了其分布的功效。理论结果表明,当取样的配置支持值分布是精确的基数级数字,而精确的基数级数据是精确的基数级分析,最终的基数级数据是精确的。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

【经典书】线性代数元素，197页pdf

专知会员服务

57+阅读 · 2021年3月4日

不可错过！UIUC最新《统计强化学习》课程！

专知会员服务

55+阅读 · 2020年9月7日

【ICML2020】图神经网络谱聚类

专知会员服务

43+阅读 · 2020年7月7日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日