Existing bounds on the generalization error of deep networks assume some form of smooth or bounded dependence on the input variable, falling short of investigating the mechanisms controlling such factors in practice. In this work, we present an extensive experimental study of the empirical Lipschitz constant of deep networks undergoing double descent, and highlight non-monotonic trends strongly correlating with the test error. Building a connection between parameter-space and input-space gradients for SGD around a critical point, we isolate two important factors -- namely loss landscape curvature and distance of parameters from initialization -- respectively controlling optimization dynamics around a critical point and bounding model function complexity, even beyond the training data. Our study presents novels insights on implicit regularization via overparameterization, and effective model complexity for networks trained in practice.
翻译:现有关于深度网络泛化误差的理论界限通常假设网络对输入变量的依赖具有某种形式的平滑性或有界性,但未能深入探讨实践中控制这些因素的机制。在本工作中,我们针对经历双重下降现象的深度网络经验利普希茨常数进行了广泛的实验研究,并揭示了与测试误差高度相关的非单调趋势。通过建立随机梯度下降(SGD)在临界点附近参数空间与输入空间梯度的联系,我们分离出两个关键因素——即损失景观曲率与参数初始化的距离——它们分别控制临界点附近的优化动力学以及模型函数复杂度的边界(甚至超出训练数据范围)。本研究为过参数化隐式正则化以及实际训练网络的效模型复杂度提供了新颖的见解。