We study over-parameterized classifiers where Empirical Risk Minimization (ERM) for learning leads to zero training error. In these over-parameterized settings there are many global minima with zero training error, some of which generalize better than others. We show that under certain conditions the fraction of "bad" global minima with a true error larger than {\epsilon} decays to zero exponentially fast with the number of training data n. The bound depends on the distribution of the true error over the set of classifier functions used for the given classification problem, and does not necessarily depend on the size or complexity (e.g. the number of parameters) of the classifier function set. This might explain the unexpectedly good generalization even of highly over-parameterized Neural Networks. We validate our mathematical framework with experiments on a synthetic data set and a subset of MNIST, and also test our hypothesis with VGG19 and ResNet18 on a subset of Caltech101.
翻译:我们研究学习过程中经验风险最小化(ERM)导致零训练误差的过参数化分类器。在这些过参数化设置中,存在许多零训练误差的全局最小值,其中一些泛化性能优于其他。我们证明,在特定条件下,真实误差大于{\epsilon}的"坏"全局最小值所占比例随训练数据量n呈指数级衰减至零。该界限取决于给定分类问题所用分类器函数集上真实误差的分布,而不一定依赖于分类器函数集的大小或复杂度(例如参数数量)。这或许能解释即使高度过参数化的神经网络也能获得出乎意料的良好泛化性能。我们通过在合成数据集和MNIST子集上的实验验证了数学模型,并在Caltech101子集上使用VGG19和ResNet18检验了假设。