Two key tasks in high-dimensional regularized regression are tuning the regularization strength for good predictions and estimating the out-of-sample risk. It is known that the standard approach -- $k$-fold cross-validation -- is inconsistent in modern high-dimensional settings. While leave-one-out and generalized cross-validation remain consistent in some high-dimensional cases, they become inconsistent when samples are dependent or contain heavy-tailed covariates. To model structured sample dependence and heavy tails, we use right-rotationally invariant covariate distributions - a crucial concept from compressed sensing. In the common modern proportional asymptotics regime where the number of features and samples grow comparably, we introduce a new framework, ROTI-GCV, for reliably performing cross-validation. Along the way, we propose new estimators for the signal-to-noise ratio and noise variance under these challenging conditions. We conduct extensive experiments that demonstrate the power of our approach and its superiority over existing methods.
翻译:高维正则化回归中的两个关键任务是调整正则化强度以获得良好预测效果以及估计样本外风险。众所周知,标准方法——$k$折交叉验证——在现代高维场景下是不一致的。尽管留一交叉验证和广义交叉验证在某些高维情况下仍保持一致性,但当样本存在依赖性或包含重尾协变量时,它们也会变得不一致。为建模结构化样本依赖性和重尾特征,我们采用右旋转不变的协变量分布——这是压缩感知中的一个关键概念。在特征数量与样本数量可比增长的现代常见比例渐近框架下,我们提出了一个新框架ROTI-GCV,用于可靠地执行交叉验证。在此过程中,我们针对这些具有挑战性的条件提出了信噪比与噪声方差的新估计量。我们进行了大量实验,证明了所提方法的有效性及其相对于现有方法的优越性。