We propose a new method for statistical inference in generalized linear models. In the overparameterized regime, Principal Component Regression (PCR) reduces variance by projecting high-dimensional data to a low-dimensional principal subspace before fitting. However, PCR incurs truncation bias whenever the true regression vector has mass outside the retained principal components (PC). To mitigate the bias, we propose Calibrated Principal Component Regression (CPCR), which first learns a low-variance prior in the PC subspace and then calibrates the model in the original feature space via a centered Tikhonov step. CPCR leverages cross-fitting and controls the truncation bias by softening PCR's hard cutoff. Theoretically, we calculate the out-of-sample risk in the random matrix regime, which shows that CPCR outperforms standard PCR when the regression signal has non-negligible components in low-variance directions. Empirically, CPCR consistently improves prediction across multiple overparameterized problems. The results highlight CPCR's stability and flexibility in modern overparameterized settings.
翻译:我们提出了一种新的广义线性模型统计推断方法。在过参数化场景下,主成分回归(PCR)通过在拟合前将高维数据投影到低维主成分子空间来降低方差。然而,当真回归向量在保留主成分(PC)之外存在质量分布时,PCR会产生截断偏差。为缓解该偏差,我们提出校准主成分回归(CPCR):该方法首先在主成分子空间中学习低方差先验,然后通过中心化Tikhonov步骤在原始特征空间中对模型进行校准。CPCR利用交叉拟合技术,通过软化PCR的硬截断来控制截断偏差。理论上,我们计算了随机矩阵机制下的样本外风险,结果表明当回归信号在低方差方向上存在不可忽略的分量时,CPCR优于标准PCR。实验方面,CPCR在多个过参数化问题中持续改进预测性能。这些结果凸显了CPCR在现代过参数化环境中的稳定性和灵活性。