Existing generalization bounds fail to explain crucial factors that drive generalization of modern neural networks. Since such bounds often hold uniformly over all parameters, they suffer from over-parametrization, and fail to account for the strong inductive bias of initialization and stochastic gradient descent. As an alternative, we propose a novel optimal transport interpretation of the generalization problem. This allows us to derive instance-dependent generalization bounds that depend on the local Lipschitz regularity of the earned prediction function in the data space. Therefore, our bounds are agnostic to the parametrization of the model and work well when the number of training samples is much smaller than the number of parameters. With small modifications, our approach yields accelerated rates for data on low-dimensional manifolds, and guarantees under distribution shifts. We empirically analyze our generalization bounds for neural networks, showing that the bound values are meaningful and capture the effect of popular regularization methods during training.
翻译:现有泛化界无法解释驱动现代神经网络泛化的关键因素。由于此类边界通常对所有参数一致成立,它们存在过参数化问题,且未能考虑初始化和随机梯度下降的强归纳偏差。为此,我们提出一种关于泛化问题的新型最优输运解释,由此推导出依赖于数据空间中所得预测函数局部Lipschitz正则性的实例相关泛化界。因此,我们的边界与模型参数化无关,且在训练样本数远小于参数数量时表现良好。经过细微调整,该方法能为低维流形上的数据提供加速收敛率,并在分布偏移下提供保证。我们通过实证分析神经网络的泛化界,证明边界值具有实际意义,且能捕捉训练过程中常见正则化方法的影响。