Deep neural network (NN) with millions or billions of parameters can perform really well on unseen data, after being trained from a finite training set. Various prior theories have been developed to explain such excellent ability of NNs, but do not provide a meaningful bound on the test error. Some recent theories, based on PAC-Bayes and mutual information, are non-vacuous and hence show a great potential to explain the excellent performance of NNs. However, they often require a stringent assumption and extensive modification (e.g. compression, quantization) to the trained model of interest. Therefore, those prior theories provide a guarantee for the modified versions only. In this paper, we propose two novel bounds on the test error of a model. Our bounds uses the training set only and require no modification to the model. Those bounds are verified on a large class of modern NNs, pretrained by Pytorch on the ImageNet dataset, and are non-vacuous. To the best of our knowledge, these are the first non-vacuous bounds at this large scale, without any modification to the pretrained models.
翻译:深度神经网络(NN)拥有数百万乃至数十亿参数,在从有限训练集训练后,能在未见数据上表现出色。已有多种理论试图解释神经网络这种卓越能力,但均未能提供有意义的测试误差界。一些基于PAC-Bayes与互信息的最新理论具有非平凡性,显示出解释神经网络优异性能的巨大潜力。然而,这些理论通常需要严格假设,并对所关注的训练模型进行大量修改(如压缩、量化)。因此,现有理论仅能保证修改后版本的性能。本文提出两种模型测试误差的新颖界。我们的界仅使用训练集,且无需对模型进行任何修改。这些界在PyTorch于ImageNet数据集上预训练的大类现代神经网络上得到验证,且具有非平凡性。据我们所知,这是首次在此大规模场景下实现无需修改预训练模型的非平凡泛化界。