无需修改训练模型的深度神经网络非平凡泛化界 (Non-vacuous Generalization Bounds for Deep Neural Networks without any modification to the trained models)

Deep neural network (NN) with millions or billions of parameters can perform really well on unseen data, after being trained from a finite training set. Various prior theories have been developed to explain such excellent ability of NNs, but do not provide a meaningful bound on the test error. Some recent theories, based on PAC-Bayes and mutual information, are non-vacuous and hence show a great potential to explain the excellent performance of NNs. However, they often require a stringent assumption and extensive modification (e.g. compression, quantization) to the trained model of interest. Therefore, those prior theories provide a guarantee for the modified versions only. In this paper, we propose two novel bounds on the test error of a model. Our bounds uses the training set only and require no modification to the model. Those bounds are verified on a large class of modern NNs, pretrained by Pytorch on the ImageNet dataset, and are non-vacuous. To the best of our knowledge, these are the first non-vacuous bounds at this large scale, without any modification to the pretrained models.

翻译：深度神经网络（NN）拥有数百万乃至数十亿参数，在从有限训练集训练后，能在未见数据上表现出色。已有多种理论试图解释神经网络这种卓越能力，但均未能提供有意义的测试误差界。一些基于PAC-Bayes与互信息的最新理论具有非平凡性，显示出解释神经网络优异性能的巨大潜力。然而，这些理论通常需要严格假设，并对所关注的训练模型进行大量修改（如压缩、量化）。因此，现有理论仅能保证修改后版本的性能。本文提出两种模型测试误差的新颖界。我们的界仅使用训练集，且无需对模型进行任何修改。这些界在PyTorch于ImageNet数据集上预训练的大类现代神经网络上得到验证，且具有非平凡性。据我们所知，这是首次在此大规模场景下实现无需修改预训练模型的非平凡泛化界。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日