We study subsampling-based ridge ensembles in the proportional asymptotics regime, where the feature size grows proportionally with the sample size such that their ratio converges to a constant. By analyzing the squared prediction risk of ridge ensembles as a function of the explicit penalty $\lambda$ and the limiting subsample aspect ratio $\phi_s$ (the ratio of the feature size to the subsample size), we characterize contours in the $(\lambda, \phi_s)$-plane at any achievable risk. As a consequence, we prove that the risk of the optimal full ridgeless ensemble (fitted on all possible subsamples) matches that of the optimal ridge predictor. In addition, we prove strong uniform consistency of generalized cross-validation (GCV) over the subsample sizes for estimating the prediction risk of ridge ensembles. This allows for GCV-based tuning of full ridgeless ensembles without sample splitting and yields a predictor whose risk matches optimal ridge risk.
翻译:我们研究比例渐近机制下的基于子采样的岭回归集成,其中特征维度与样本量成比例增长,使得它们的比值收敛到一个常数。通过分析岭回归集成的平方预测风险作为显式惩罚参数$\lambda$和极限子采样纵横比$\phi_s$(特征维度与子样本大小的比值)的函数,我们刻画了$(\lambda, \phi_s)$-平面上任意可达到风险水平下的等值线。作为推论,我们证明了最优完全无惩罚岭回归集成(在所有可能的子样本上拟合)的风险与最优岭回归预测器的风险相匹配。此外,我们证明了用于估计岭回归集成预测风险的广义交叉验证(GCV)在子样本大小上具有强一致收敛性。这使得无需样本分割即可对完全无惩罚岭回归集成进行基于GCV的调参,从而得到一个风险与最优岭回归风险相匹配的预测器。