We investigate how shallow ReLU networks interpolate between known regions. Our analysis shows that empirical risk minimizers converge to a minimum norm interpolant as the number of data points and parameters tends to infinity when a weight decay regularizer is penalized with a coefficient which vanishes at a precise rate as the network width and the number of data points grow. With and without explicit regularization, we numerically study the implicit bias of common optimization algorithms towards known minimum norm interpolants.
翻译:我们研究浅层ReLU网络如何在已知区域之间进行插值。我们的分析表明,当权重衰减正则化器的系数随网络宽度和数据点数量增长以精确速率趋于零时,经验风险最小化器在数据点和参数数量趋于无穷大时会收敛到最小范数插值函数。在有无显式正则化的情况下,我们通过数值方法研究了常见优化算法对已知最小范数插值函数的隐式偏差。