Existing bounds on the generalization error of deep networks assume some form of smooth or bounded dependence on the input variable, falling short of investigating the mechanisms controlling such factors in practice. In this work, we present an extensive experimental study of the empirical Lipschitz constant of deep networks undergoing double descent, and highlight non-monotonic trends strongly correlating with the test error. Building a connection between parameter-space and input-space gradients for SGD around a critical point, we isolate two important factors -- namely loss landscape curvature and distance of parameters from initialization -- respectively controlling optimization dynamics around a critical point and bounding model function complexity, even beyond the training data. Our study presents novels insights on implicit regularization via overparameterization, and effective model complexity for networks trained in practice.
翻译:现有关于深度网络泛化误差的界通常假设输入变量具有某种光滑或有界依赖性,但未能深入探究实践中控制此类因素的机制。本文通过大量实验研究经历双下降现象的深度网络的经验Lipschitz常数,揭示出与测试误差高度相关的非单调趋势。通过建立临界点附近SGD的参数空间与输入空间梯度之间的关联,我们分离出两个关键因素——即损失景观曲率与参数相对于初始化的距离——它们分别控制临界点附近的优化动力学并约束模型函数复杂度(即便超出训练数据范围)。本研究为通过过参数化实现的隐式正则化以及实际训练网络的有效模型复杂度提供了新颖见解。