We analyze the prediction error of principal component regression (PCR) and prove high probability bounds for the corresponding squared risk conditional on the design. Our first main result shows that PCR performs comparably to the oracle method obtained by replacing empirical principal components by their population counterparts, provided that an effective rank condition holds. On the other hand, if the latter condition is violated, then empirical eigenvalues start to have a significant upward bias, resulting in a self-induced regularization of PCR. Our approach relies on the behavior of empirical eigenvalues, empirical eigenvectors and the excess risk of principal component analysis in high-dimensional regimes.
翻译:我们分析了主成分回归(PCR)的预测误差,并证明了在设计条件依赖下相应平方风险的高概率界。我们的第一个主要结果表明,当有效秩条件成立时,PCR的表现可与通过用总体主成分替换经验主成分而获得的理想方法相媲美。另一方面,如果该条件不成立,则经验特征值开始出现显著的上偏,导致PCR产生自诱导正则化。我们的方法依赖于高维情形下经验特征值、经验特征向量以及主成分分析超额风险的行为特性。