Consistent Risk Estimation in Moderately High-Dimensional Linear Regression

Risk estimation is at the core of many learning systems. The importance of this problem has motivated researchers to propose different schemes, such as cross validation, generalized cross validation, and Bootstrap. The theoretical properties of such estimates have been extensively studied in the low-dimensional settings, where the number of predictors $p$ is much smaller than the number of observations $n$. However, a unifying methodology accompanied with a rigorous theory is lacking in high-dimensional settings. This paper studies the problem of risk estimation under the moderately high-dimensional asymptotic setting $n,p \rightarrow \infty$ and $n/p \rightarrow \delta>1$ ($\delta$ is a fixed number), and proves the consistency of three risk estimates that have been successful in numerical studies, i.e., leave-one-out cross validation (LOOCV), approximate leave-one-out (ALO), and approximate message passing (AMP)-based techniques. A corner stone of our analysis is a bound that we obtain on the discrepancy of the `residuals' obtained from AMP and LOOCV. This connection not only enables us to obtain a more refined information on the estimates of AMP, ALO, and LOOCV, but also offers an upper bound on the convergence rate of each estimate.

翻译：这一问题的重要性促使研究人员提出不同的计划,如交叉验证、普遍交叉验证和诱杀装置。这种估算的理论性质已经在低维环境中进行了广泛研究,在低维环境中,预测者美元的数量远远少于观测数量。然而,在高维环境中,缺乏一种带有严格理论的统一方法。本文研究了中度高度无空间设置下的风险估算问题,即,p\rightrow\infty$和$n/p\rightrow\delta>1美元(美元=delta$是一个固定数字),并证明了在数字研究中成功的三种风险估算的一致性,即,请假一次性交叉验证(LOOCV),近似假一出(ALO),以及近似信息传递(AMP)技术。我们分析的角落石石是我们从AMP和LOOC的“redialals”获得的“residalals”差异的束缚,但LOV估计数的上限也使我们无法在AMP和LOV的更新后获得。