We introduce a new notion of complexity of functions and we show that it has the following properties: (i) it governs a PAC Bayes-like generalization bound, (ii) for neural networks it relates to natural notions of complexity of functions (such as the variation), and (iii) it explains the generalization gap between neural networks and linear schemes. While there is a large set of papers which describes bounds that have each such property in isolation, and even some that have two, as far as we know, this is a first notion that satisfies all three of them. Moreover, in contrast to previous works, our notion naturally generalizes to neural networks with several layers. Even though the computation of our complexity is nontrivial in general, an upper-bound is often easy to derive, even for higher number of layers and functions with structure, such as period functions. An upper-bound we derive allows to show a separation in the number of samples needed for good generalization between 2 and 4-layer neural networks for periodic functions.
翻译:我们引入了一种新的函数复杂度概念,并证明其具有以下性质:(i)它支配了一种类似PAC贝叶斯的泛化界;(ii)对于神经网络,它与函数的自然复杂度概念(如变差)相关;(iii)它解释了神经网络与线性方案之间的泛化差距。尽管已有大量文献分别描述了具有这些性质的界,甚至有些同时满足其中两个性质,但据我们所知,这是第一个同时满足所有三个性质的概念。此外,与以往工作不同,我们的概念自然地推广到多层神经网络。尽管一般情况下我们的复杂度计算并非易事,但其上界往往容易推导,即使对于层数较多且具有结构(如周期函数)的函数也是如此。我们推导的一个上界表明,对于周期函数,在需要良好泛化的样本数量上,2层与4层神经网络之间存在显著差异。