Spectrum-Aware Debiasing: A Modern Inference Framework with Applications to Principal Components Regression

Debiasing is a fundamental concept in high-dimensional statistics. While degrees-of-freedom adjustment is the state-of-the-art technique in high-dimensional linear regression, it is limited to i.i.d. samples and sub-Gaussian covariates. These constraints hinder its broader practical use. Here, we introduce Spectrum-Aware Debiasing--a novel method for high-dimensional regression. Our approach applies to problems with structured dependencies, heavy tails, and low-rank structures. Our method achieves debiasing through a rescaled gradient descent step, deriving the rescaling factor using spectral information of the sample covariance matrix. The spectrum-based approach enables accurate debiasing in much broader contexts. We study the common modern regime where the number of features and samples scale proportionally. We establish asymptotic normality of our proposed estimator (suitably centered and scaled) under various convergence notions when the covariates are right-rotationally invariant. Such designs have garnered recent attention due to their crucial role in compressed sensing. Furthermore, we devise a consistent estimator for its asymptotic variance. Our work has two notable by-products: first, we use Spectrum-Aware Debiasing to correct bias in principal components regression (PCR), providing the first debiased PCR estimator in high dimensions. Second, we introduce a principled test for checking alignment between the signal and the eigenvectors of the sample covariance matrix. This test is independently valuable for statistical methods developed using approximate message passing, leave-one-out, or convex Gaussian min-max theorems. We demonstrate our method through simulated and real data experiments. Technically, we connect approximate message passing algorithms with debiasing and provide the first proof of the Cauchy property of vector approximate message passing (V-AMP).

翻译：去偏是高维统计学中的一个基本概念。尽管自由度调整是高维线性回归中的前沿技术，但其仅限于独立同分布样本和亚高斯协变量。这些限制阻碍了其更广泛的实际应用。本文提出频谱感知去偏——一种适用于高维回归的新方法。我们的方法可应用于具有结构化依赖、重尾分布和低秩结构的问题。该方法通过重新缩放的梯度下降步骤实现去偏，并利用样本协方差矩阵的频谱信息推导缩放因子。这种基于频谱的方法能够在更广泛的场景中实现精确去偏。我们研究了特征数与样本量成比例增长的典型现代场景。当协变量满足右旋转不变性时，我们在多种收敛概念下建立了所提估计量（经适当中心化和缩放）的渐近正态性。此类设计因在压缩感知中的关键作用而受到近期关注。此外，我们构建了其渐近方差的一致估计量。本工作产生两个重要副产品：首先，我们运用频谱感知去偏校正主成分回归（PCR）的偏差，提出了高维情况下首个去偏PCR估计量。其次，我们提出了一种原理性检验方法，用于检测信号与样本协方差矩阵特征向量之间的对齐关系。该检验对于基于近似消息传递、留一法或凸高斯极小极大定理开发的统计方法具有独立价值。我们通过模拟和真实数据实验验证了所提方法。在技术层面，我们将近似消息传递算法与去偏理论相结合，首次证明了向量近似消息传递（V-AMP）的柯西性质。