Hastie et al. (2022) found that ridge regularization is essential in high dimensional linear regression $y=\beta^Tx + \epsilon$ with isotropic co-variates $x\in \mathbb{R}^d$ and $n$ samples at fixed $d/n$. However, Hastie et al. (2022) also notes that when the co-variates are anisotropic and $\beta$ is aligned with the top eigenvalues of population covariance, the "situation is qualitatively different." In the present article, we make precise this observation for linear regression with highly anisotropic covariances and diverging $d/n$. We find that simply scaling up (or inflating) the minimum $\ell_2$ norm interpolator by a constant greater than one can improve the generalization error. This is in sharp contrast to traditional regularization/shrinkage prescriptions. Moreover, we use a data-splitting technique to produce consistent estimators that achieve generalization error comparable to that of the optimally inflated minimum-norm interpolator. Our proof relies on apparently novel matching upper and lower bounds for expectations of Gaussian random projections for a general class of anisotropic covariance matrices when $d/n\to \infty$.
翻译:Hastie等人(2022)发现,在各向同性协变量$x\in \mathbb{R}^d$、样本量为$n$且固定$d/n$的高维线性回归$y=\beta^Tx + \epsilon$中,岭正则化至关重要。然而,Hastie等人(2022)也指出,当协变量呈各向异性且$\beta$与总体协方差矩阵的顶部特征值对齐时,"情况在性质上有所不同"。本文针对具有高度各向异性协方差和发散$d/n$的线性回归,精确阐述了这一观察结果。我们发现,仅需将最小$\ell_2$范数插值器按大于1的常数进行放大(或膨胀),即可改善泛化误差。这与传统的正则化/收缩方法形成鲜明对比。此外,我们采用数据分割技术构建了一致估计量,其泛化误差可达最优膨胀最小范数插值器的可比水平。我们的证明依赖于当$d/n\to \infty$时,针对一类广义各向异性协方差矩阵的高斯随机投影期望值,建立了明显新颖的上下界匹配关系。