Recent advances in deep learning have given us some very promising results on the generalization ability of deep neural networks, however literature still lacks a comprehensive theory explaining why heavily over-parametrized models are able to generalize well while fitting the training data. In this paper we propose a PAC type bound on the generalization error of feedforward ReLU networks via estimating the Rademacher complexity of the set of networks available from an initial parameter vector via gradient descent. The key idea is to bound the sensitivity of the network's gradient to perturbation of the input data along the optimization trajectory. The obtained bound does not explicitly depend on the depth of the network. Our results are experimentally verified on the MNIST and CIFAR-10 datasets.
翻译:深度学习的最新进展使我们在深度神经网络的泛化能力方面取得了一些非常有前景的结果,然而现有文献仍缺乏一套完整的理论来解释为何高度过参数化的模型在拟合训练数据的同时能实现良好泛化。本文通过估计从初始参数向量经梯度下降可获得的网络集合的Rademacher复杂度,提出了前馈ReLU网络泛化误差的PAC型界。其关键思想是沿优化轨迹约束网络梯度对输入数据扰动的灵敏度。所获得的界不显式依赖于网络深度。我们在MNIST和CIFAR-10数据集上对所提结果进行了实验验证。