Existing generalization bounds fail to explain crucial factors that drive the generalization of modern neural networks. Since such bounds often hold uniformly over all parameters, they suffer from over-parametrization and fail to account for the strong inductive bias of initialization and stochastic gradient descent. As an alternative, we propose a novel optimal transport interpretation of the generalization problem. This allows us to derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the learned prediction function in the data space. Therefore, our bounds are agnostic to the parametrization of the model and work well when the number of training samples is much smaller than the number of parameters. With small modifications, our approach yields accelerated rates for data on low-dimensional manifolds and guarantees under distribution shifts. We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
翻译:现有的泛化界无法解释驱动现代神经网络泛化的关键因素。由于这类界通常对所有参数一致成立,它们受限于过参数化,且未能捕捉初始化和随机梯度下降的强归纳偏置。作为替代方案,我们提出了泛化问题的一种新颖的最优传输解释。由此,我们推导出与实例相关的泛化界,该界依赖于数据空间中学习到的预测函数的局部Lipschitz正则性。因此,我们的界与模型的参数化无关,且在训练样本数远小于参数数时表现良好。通过微小修改,我们的方法还可为低维流形上的数据提供加速收敛率,并在分布偏移下提供保证。我们针对神经网络实证分析了所提出的泛化界,结果表明界值具有意义,并能捕捉训练过程中常见正则化方法的效果。