Overparametrization often helps improve the generalization performance. This paper presents a dual view of overparametrization suggesting that downsampling may also help generalize. Focusing on the proportional regime $m\asymp n \asymp p$, where $m$ represents the sketching size, $n$ is the sample size, and $p$ is the feature dimensionality, we investigate two out-of-sample prediction risks of the sketched ridgeless least square estimator. Our findings challenge conventional beliefs by showing that downsampling does not always harm generalization but can actually improve it in certain cases. We identify the optimal sketching size that minimizes out-of-sample prediction risks and demonstrate that the optimally sketched estimator exhibits stabler risk curves, eliminating the peaks of those for the full-sample estimator. To facilitate practical implementation, we propose an empirical procedure to determine the optimal sketching size. Finally, we extend our analysis to cover central limit theorems and misspecified models. Numerical studies strongly support our theory.
翻译:过参数化通常有助于提升泛化性能。本文提出过参数化的对偶视角,表明降采样同样可能促进泛化。聚焦于比例机制 $m\asymp n \asymp p$(其中 $m$ 表示草图大小,$n$ 为样本量,$p$ 为特征维度),我们研究了草图化无岭最小二乘估计量的两种样本外预测风险。研究结果挑战了传统认知,表明降采样并非始终损害泛化性能,在特定情况下反而能提升其表现。我们识别出使样本外预测风险最小化的最优草图大小,并证明最优草图化估计量具有更稳定的风险曲线,消除了全样本估计量风险曲线中的峰值。为便于实际应用,我们提出了一种确定最优草图大小的经验性方法。最后,我们将分析扩展至中心极限定理与错误设定模型。数值实验有力地支持了我们的理论。