We introduce a new debiasing framework for high-dimensional linear regression that bypasses the restrictions on covariate distributions imposed by modern debiasing technology. We study the prevalent setting where the number of features and samples are both large and comparable. In this context, state-of-the-art debiasing technology uses a degrees-of-freedom correction to remove shrinkage bias of regularized estimators and conduct inference. However, this method requires that the observed samples are i.i.d., the covariates follow a mean zero Gaussian distribution, and reliable covariance matrix estimates for observed features are available. This approach struggles when (i) covariates are non-Gaussian with heavy tails or asymmetric distributions, (ii) rows of the design exhibit heterogeneity or dependencies, and (iii) reliable feature covariance estimates are lacking. To address these, we develop a new strategy where the debiasing correction is a rescaled gradient descent step (suitably initialized) with step size determined by the spectrum of the sample covariance matrix. Unlike prior work, we assume that eigenvectors of this matrix are uniform draws from the orthogonal group. We show this assumption remains valid in diverse situations where traditional debiasing fails, including designs with complex row-column dependencies, heavy tails, asymmetric properties, and latent low-rank structures. We establish asymptotic normality of our proposed estimator (centered and scaled) under various convergence notions. Moreover, we develop a consistent estimator for its asymptotic variance. Lastly, we introduce a debiased Principal Component Regression (PCR) technique using our Spectrum-Aware approach. In varied simulations and real data experiments, we observe that our method outperforms degrees-of-freedom debiasing by a margin.
翻译:我们提出了一种新的高维线性回归去偏框架,该框架规避了现代去偏技术对协变量分布施加的限制。我们研究了特征数量与样本量均较大且可比的常见场景。在此背景下,现有最先进的去偏技术采用自由度校正来消除正则化估计量的收缩偏差并进行推断。然而,该方法要求观测样本独立同分布、协变量服从均值为零的高斯分布,且需获得观测特征可靠的协方差矩阵估计。当以下情况出现时,该技术面临挑战:(i) 协变量呈现非高斯分布(如重尾或不对称分布),(ii) 设计矩阵行间存在异方差或依赖关系,(iii) 缺乏可靠的特征协方差估计。为应对这些问题,我们开发了一种新策略:通过适当初始化的重缩放梯度下降步骤进行去偏校正,其步长由样本协方差矩阵的谱决定。与先前工作不同,我们假设该矩阵的特征向量服从正交群上的均匀分布。我们证明,在传统去偏方法失效的多种情形下(包括具有复杂行列依赖结构、重尾分布、非对称特性及潜在低秩结构的设计矩阵),该假设依然成立。我们建立了所提估计量(经中心化和缩放后)在多种收敛概念下的渐近正态性,并开发了其渐近方差的一致估计量。最后,我们利用所提频谱感知方法提出了一种去偏主成分回归(PCR)技术。在多种模拟实验和真实数据实验中,我们观察到该方法相比自由度去偏方法具有显著优势。