Overparametrization often helps improve the generalization performance. This paper proposes a dual view of overparametrization suggesting that downsampling may also help generalize. Motivated by this dual view, we characterize two out-of-sample prediction risks of the sketched ridgeless least square estimator in the proportional regime $m\asymp n \asymp p$, where $m$ is the sketching size, $n$ the sample size, and $p$ the feature dimensionality. Our results reveal the statistical role of downsampling. Specifically, downsampling does not always hurt the generalization performance, and may actually help improve it in some cases. We identify the optimal sketching sizes that minimize the out-of-sample prediction risks, and find that the optimally sketched estimator has stabler risk curves that eliminates the peaks of those for the full-sample estimator. We then propose a practical procedure to empirically identify the optimal sketching size. Finally, we extend our results to cover central limit theorems and misspecified models. Numerical studies strongly support our theory.
翻译:过参数化通常有助于提升泛化性能。本文提出过参数化的对偶视角,表明降采样也可能有助于泛化。受此对偶视角启发,我们在比例机制 $m\asymp n \asymp p$(其中 $m$ 为草稿规模,$n$ 为样本量,$p$ 为特征维度)下,刻画了草稿化无岭最小二乘估计量的两种样本外预测风险。我们的结果揭示了降采样的统计作用。具体而言,降采样并不总是损害泛化性能,在某些情况下甚至可能有助于提升。我们识别出最小化样本外预测风险的最优草稿规模,并发现最优草稿化估计量具有更稳定的风险曲线,消除了全样本估计量风险曲线中的峰值。随后,我们提出了一种实用程序以经验性地确定最优草稿规模。最后,我们将结果推广至涵盖中心极限定理和错误设定模型。数值研究充分支持了我们的理论。