Principal component regression is a popular method to use when the predictor matrix in a regression is of reduced column rank. It has been proposed to stabilize computation under such conditions, and to improve prediction accuracy by reducing variance of the least squares estimator for the regression slopes. However, it presents the added difficulty of having to determine which principal components to include in the regression. I provide arguments against selecting the principal components by the magnitude of their associated eigenvalues, by examining the estimator for the residual variance, and by examining the contribution of the residual variance to the variance of the estimator for the regression slopes. I show that when a principal component is omitted from the regression that is important in explaining the response variable, the residual variance is overestimated, so that the variance of the estimator for the regression slopes can be higher than that of the ordinary least squares estimator.
翻译:主成分回归是一种在回归中预测矩阵列秩降低时常用的方法。该方法被提出用于在此类条件下稳定计算,并通过减少回归斜率的最小二乘估计量的方差来提高预测精度。然而,它带来了额外的困难,即需要确定回归中包含哪些主成分。我通过检验残差方差的估计量,并探讨残差方差对回归斜率估计量方差的贡献,提供了反对依据主成分特征值大小选择主成分的理由。我证明,当回归中遗漏了对解释响应变量重要的主成分时,残差方差会被高估,从而使得回归斜率估计量的方差可能高于普通最小二乘估计量的方差。