The ability of overparameterized deep networks to interpolate noisy data, while at the same time showing good generalization performance, has been recently characterized in terms of the double descent curve for the test error. Common intuition from polynomial regression suggests that overparameterized networks are able to sharply interpolate noisy data, without considerably deviating from the ground-truth signal, thus preserving their generalization ability. At present, a precise characterization of the relationship between interpolation and generalization for deep networks is missing. In this work, we quantify sharpness of fit of the training data interpolated by neural network functions, by studying the loss landscape w.r.t.\ to the input variable locally to each training point, over volumes around cleanly- and noisily-labelled training samples, as we systematically increase the number of model parameters and training epochs. Our findings show that loss sharpness in the input space follows both model- and epoch-wise double descent, with worse peaks observed around noisy labels. While small interpolating models sharply fit both clean and noisy data, large interpolating models express a smooth loss landscape, where noisy targets are predicted over large volumes around training data points, in contrast to existing intuition.
翻译:过参数化深度网络在插值含噪数据的同时展现良好泛化性能的能力,近期已通过测试误差的双重下降曲线得以刻画。多项式回归的常见直觉表明,过参数化网络能够锐利地插值含噪数据,同时不会显著偏离真实信号,从而保持其泛化能力。目前,深度网络中插值与泛化关系的精确表征尚存缺失。本研究通过研究训练点附近输入变量的损失景观(在标注干净与标注含噪的训练样本周围区域,随模型参数数量和训练轮次的系统性增加),量化了神经网络函数插值的训练数据拟合锐度。研究发现,输入空间的损失锐度呈现模型维度与训练轮次的双重下降,且在含噪标签附近出现更严重的峰值。与小参数插值模型锐利拟合干净与含噪数据不同,大参数插值模型展现出平滑的损失景观——与现有直觉相悖,此时含噪目标会在训练数据点周围的大范围区域内被预测。