The research detailed in this paper scrutinizes Principal Component Analysis (PCA), a seminal method employed in statistics and machine learning for the purpose of reducing data dimensionality. Singular Value Decomposition (SVD) is often employed as the primary means for computing PCA, a process that indispensably includes the step of centering - the subtraction of the mean location from the data set. In our study, we delve into a detailed exploration of the influence of this critical yet often ignored or downplayed data centering step. Our research meticulously investigates the conditions under which two PCA embeddings, one derived from SVD with centering and the other without, can be viewed as aligned. As part of this exploration, we analyze the relationship between the first singular vector and the mean direction, subsequently linking this observation to the congruity between two SVDs of centered and uncentered matrices. Furthermore, we explore the potential implications arising from the absence of centering in the context of performing PCA via SVD from a spectral analysis standpoint. Our investigation emphasizes the importance of a comprehensive understanding and acknowledgment of the subtleties involved in the computation of PCA. As such, we believe this paper offers a crucial contribution to the nuanced understanding of this foundational statistical method and stands as a valuable addition to the academic literature in the field of statistics.
翻译:本文详细审视了主成分分析(PCA)——统计学与机器学习中用于数据降维的一种开创性方法。奇异值分解(SVD)常被用作计算PCA的主要手段,而该过程不可或缺地包含中心化步骤——即从数据集中减去均值位置。在我们的研究中,我们深入探讨了这一关键却常被忽视或低估的数据中心化步骤的影响。我们细致研究了两种PCA嵌入(一种基于带中心化的SVD,另一种基于无中心化的SVD)在何种条件下可被视为对齐。作为探索的一部分,我们分析了第一奇异向量与均值方向之间的关系,进而将这一观察与中心化矩阵和非中心化矩阵的两种SVD之间的一致性联系起来。此外,我们从谱分析的角度探讨了在通过SVD执行PCA时缺乏中心化可能引发的潜在影响。我们的研究强调了全面理解并认识PCA计算中微妙之处的重要性。因此,我们认为本文对这一基础统计方法的细微理解做出了重要贡献,并成为统计学领域学术文献中的宝贵补充。