Uniform error bound for PCA matrix denoising

Principal component analysis (PCA) is a simple and popular tool for processing high-dimensional data. We investigate its effectiveness for matrix denoising. We consider the clean data are generated from a low-dimensional subspace, but masked by independent high-dimensional sub-Gaussian noises with standard deviation $\sigma$. Under the low-rank assumption on the clean data with a mild spectral gap assumption, we prove that the distance between each pair of PCA-denoised data point and the clean data point is uniformly bounded by $O(\sigma \log n)$. To illustrate the spectral gap assumption, we show it can be satisfied when the clean data are independently generated with a non-degenerate covariance matrix. We then provide a general lower bound for the error of the denoised data matrix, which indicates PCA denoising gives a uniform error bound that is rate-optimal. Furthermore, we examine how the error bound impacts downstream applications such as clustering and manifold learning. Numerical results validate our theoretical findings and reveal the importance of the uniform error.

翻译：主成分分析（PCA）是一种处理高维数据的简单而流行的工具。我们研究了其在矩阵去噪中的有效性。考虑干净数据由低维子空间生成，但被标准差为$\sigma$的独立高维次高斯噪声所掩盖。在干净数据具有低秩假设和温和谱间隙假设的条件下，我们证明每个经PCA去噪的数据点与干净数据点之间的距离一致有界于$O(\sigma \log n)$。为阐明谱间隙假设，我们证明当干净数据由非退化协方差矩阵独立生成时该假设成立。随后我们给出去噪数据矩阵误差的通用下界，表明PCA去噪能够提供率达到最优的一致误差界。进一步地，我们考察了该误差界对聚类和流形学习等下游任务的影响。数值实验结果验证了我们的理论发现并揭示了均匀误差的重要性。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

【AI应用】Facebook-利用神经网络求解高等数学方程, Using neural networks to solve advanced mathematics equations

专知会员服务

34+阅读 · 2020年1月15日