Learning algorithms that divide the data into batches are prevalent in many machine-learning applications, typically offering useful trade-offs between computational efficiency and performance. In this paper, we examine the benefits of batch-partitioning through the lens of a minimum-norm overparameterized linear regression model with isotropic Gaussian features. We suggest a natural small-batch version of the minimum-norm estimator, and derive an upper bound on its quadratic risk, showing it is inversely proportional to the noise level as well as to the overparameterization ratio, for the optimal choice of batch size. In contrast to minimum-norm, our estimator admits a stable risk behavior that is monotonically increasing in the overparameterization ratio, eliminating both the blowup at the interpolation point and the double-descent phenomenon. Interestingly, we observe that this implicit regularization offered by the batch partition is partially explained by feature overlap between the batches. Our bound is derived via a novel combination of techniques, in particular normal approximation in the Wasserstein metric of noisy projections over random subspaces.
翻译:将数据划分为批次的学习算法在许多机器学习应用中普遍存在,通常能在计算效率与性能之间提供有益的权衡。本文通过各向同性高斯特征的最小范数过参数化线性回归模型,研究批次划分的益处。我们提出了一种自然的小批量版本的最小范数估计器,并推导出其二次风险的上界,表明在最优批次大小选择下,该风险与噪声水平及过参数化比率成反比。与最小范数方法不同,我们的估计器呈现出单调递增于过参数化比率的稳定风险行为,消除了插值点处的激增及双重下降现象。有趣的是,我们观察到这种由批次划分提供的隐式正则化部分归因于批次间的特征重叠。该上界通过新颖的技术组合推导得出,特别是利用随机子空间上噪声投影的Wasserstein度量正态逼近。