We introduce a new debiasing framework for high-dimensional linear regression that bypasses the restrictions on covariate distributions imposed by modern debiasing technology. We study the prevalent setting where the number of features and samples are both large and comparable. In this context, state-of-the-art debiasing technology uses a degrees-of-freedom correction to remove the shrinkage bias of regularized estimators and conduct inference. However, this method requires that the observed samples are i.i.d., the covariates follow a mean zero Gaussian distribution, and reliable covariance matrix estimates for observed features are available. This approach struggles when (i) covariates are non-Gaussian with heavy tails or asymmetric distributions, (ii) rows of the design exhibit heterogeneity or dependencies, and (iii) reliable feature covariance estimates are lacking. To address these, we develop a new strategy where the debiasing correction is a rescaled gradient descent step (suitably initialized) with step size determined by the spectrum of the sample covariance matrix. Unlike prior work, we assume that eigenvectors of this matrix are uniform draws from the orthogonal group. We show this assumption remains valid in diverse situations where traditional debiasing fails, including designs with complex row-column dependencies, heavy tails, asymmetric properties, and latent low-rank structures. We establish asymptotic normality of our proposed estimator (centered and scaled) under various convergence notions. Moreover, we develop a consistent estimator for its asymptotic variance. Lastly, we introduce a debiased Principal Components Regression (PCR) technique using our Spectrum-Aware approach. In varied simulations and real data experiments, we observe that our method outperforms degrees-of-freedom debiasing by a margin.
翻译:我们提出了一种新的高维线性回归去偏框架,该框架规避了现代去偏技术对协变量分布施加的限制。我们研究了特征数与样本数均较大且规模相当的这一普遍场景。在此背景下,最先进的去偏技术使用自由度校正来消除正则化估计量的收缩偏差并进行推断。然而,此方法要求观测样本独立同分布、协变量服从零均值高斯分布,且存在可靠的观测特征协方差矩阵估计。当出现以下情况时,该方法难以适用:(i) 协变量非高斯、具有重尾或非对称分布;(ii) 设计矩阵的行存在异质性或相关性;(iii) 缺乏可靠的特征协方差估计。为解决这些问题,我们提出了一种新策略,其中去偏校正是一个重新缩放后的梯度下降步骤(经过合适的初始化),其步长由样本协方差矩阵的谱决定。与先前工作不同,我们假设该矩阵的特征向量是正交群上的均匀随机抽样。我们证明,该假设在传统去偏方法失效的多种情形下仍然成立,包括具有复杂行列相关性、重尾、非对称性质以及潜在低秩结构的设计矩阵。我们建立了所提估计量(经中心化与缩放后)在各种收敛概念下的渐近正态性。此外,我们推导了其渐近方差的一致估计量。最后,我们利用所提谱感知方法引入了一种去偏主成分回归(PCR)技术。在多种仿真与真实数据实验中,我们观察到该方法显著优于自由度去偏方法。