Learning algorithms that divide the data into batches are prevalent in many machine-learning applications, typically offering useful trade-offs between computational efficiency and performance. In this paper, we examine the benefits of batch-partitioning through the lens of a minimum-norm overparameterized linear regression model with isotropic Gaussian features. We suggest a natural small-batch version of the minimum-norm estimator, and derive an upper bound on its quadratic risk, showing it is inversely proportional to the noise level as well as to the overparameterization ratio, for the optimal choice of batch size. In contrast to minimum-norm, our estimator admits a stable risk behavior that is monotonically increasing in the overparameterization ratio, eliminating both the blowup at the interpolation point and the double-descent phenomenon. Interestingly, we observe that this implicit regularization offered by the batch partition is partially explained by feature overlap between the batches. Our bound is derived via a novel combination of techniques, in particular normal approximation in the Wasserstein metric of noisy projections over random subspaces.
翻译:将数据划分为批次的算法在众多机器学习应用中普遍存在,通常能在计算效率与性能之间提供有益权衡。本文通过各向同性高斯特征的最小范数过参数化线性回归模型,研究批次划分的益处。我们提出了最小范数估计量的一种自然的小批量版本,并推导出其二次风险的上界,表明在最优批次大小选择下,该风险与噪声水平及过参数化比率均成反比。与最小范数估计不同,我们的估计量呈现出随过参数化比率单调递增的稳定风险行为,消除了插值点处的激增和双下降现象。有趣的是,我们观察到这种由批次划分带来的隐式正则化部分归因于批次间的特征重叠。我们的界是通过多种技术的创新组合推导得出的,特别是随机子空间上含噪投影的Wasserstein度量正态近似。