We investigate the learning dynamics of classifiers in scenarios where classes are separable or classifiers are over-parameterized. In both cases, Empirical Risk Minimization (ERM) results in zero training error. However, there are many global minima with a training error of zero, some of which generalize well and some of which do not. We show that in separable classes scenarios the proportion of "bad" global minima diminishes exponentially with the number of training data n. Our analysis provides bounds and learning curves dependent solely on the density distribution of the true error for the given classifier function set, irrespective of the set's size or complexity (e.g., number of parameters). This observation may shed light on the unexpectedly good generalization of over-parameterized Neural Networks. For the over-parameterized scenario, we propose a model for the density distribution of the true error, yielding learning curves that align with experiments on MNIST and CIFAR-10.
翻译:本研究探讨了在类别可分离或分类器过参数化场景下分类器的学习动态。在这两种情况下,经验风险最小化(ERM)均能实现零训练误差。然而,存在许多训练误差为零的全局极小值,其中部分泛化性能良好,部分则不然。我们证明,在类别可分离场景中,“不良”全局极小值的比例随训练数据量n呈指数级衰减。我们的分析提供了仅依赖于给定分类器函数集真实误差密度分布的边界与学习曲线,而与函数集的规模或复杂度(例如参数数量)无关。这一发现可能为过参数化神经网络出人意料良好的泛化性能提供理论启示。针对过参数化场景,我们提出了真实误差密度分布的建模方法,所得学习曲线与MNIST和CIFAR-10上的实验结果相吻合。