This paper investigates the iterates $\hbb^1,\dots,\hbb^T$ obtained from iterative algorithms in high-dimensional linear regression problems, in the regime where the feature dimension $p$ is comparable with the sample size $n$, i.e., $p \asymp n$. The analysis and proposed estimators are applicable to Gradient Descent (GD), proximal GD and their accelerated variants such as Fast Iterative Soft-Thresholding (FISTA). The paper proposes novel estimators for the generalization error of the iterate $\hbb^t$ for any fixed iteration $t$ along the trajectory. These estimators are proved to be $\sqrt n$-consistent under Gaussian designs. Applications to early-stopping are provided: when the generalization error of the iterates is a U-shape function of the iteration $t$, the estimates allow to select from the data an iteration $\hat t$ that achieves the smallest generalization error along the trajectory. Additionally, we provide a technique for developing debiasing corrections and valid confidence intervals for the components of the true coefficient vector from the iterate $\hbb^t$ at any finite iteration $t$. Extensive simulations on synthetic data illustrate the theoretical results.
翻译:本文研究高维线性回归问题中从迭代算法得到的迭代序列 $\hbb^1,\dots,\hbb^T$,重点关注特征维度 $p$ 与样本量 $n$ 可比的情形,即 $p \asymp n$。所提出的分析与估计量适用于梯度下降法、近端梯度下降法及其加速变体(如快速迭代软阈值算法FISTA)。针对轨迹上任意固定迭代步 $t$ 的迭代量 $\hbb^t$,本文提出了泛化误差的新型估计量。在高斯设计下,这些估计量被证明具有 $\sqrt n$ 一致性。本文还提供了早停法的应用场景:当迭代量的泛化误差随迭代步 $t$ 呈现U型函数特征时,所提估计方法可从数据中选取在轨迹上实现最小泛化误差的迭代步 $\hat t$。此外,我们提出了一种技术,可从任意有限迭代步 $t$ 的迭代量 $\hbb^t$ 中开发真实系数向量分量的去偏校正和有效置信区间。基于合成数据的大量仿真实验验证了理论结果。