Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks as the curse of dimensionality (CoD) cannot be evaded when trying to approximate even a single ReLU neuron (Bach, 2017). In this paper, we study a suitable function space for over-parameterized two-layer neural networks with bounded norms (e.g., the path norm, the Barron norm) in the perspective of sample complexity and generalization properties. First, we show that the path norm (as well as the Barron norm) is able to obtain width-independence sample complexity bounds, which allows for uniform convergence guarantees. Based on this result, we derive the improved result of metric entropy for $\epsilon$-covering up to $\mathcal{O}(\epsilon^{-\frac{2d}{d+2}})$ ($d$ is the input dimension and the depending constant is at most polynomial order of $d$) via the convex hull technique, which demonstrates the separation with kernel methods with $\Omega(\epsilon^{-d})$ to learn the target function in a Barron space. Second, this metric entropy result allows for building a sharper generalization bound under a general moment hypothesis setting, achieving the rate at $\mathcal{O}(n^{-\frac{d+2}{2d+2}})$. Our analysis is novel in that it offers a sharper and refined estimation for metric entropy (with a clear dependence relationship on the dimension $d$) and unbounded sampling in the estimation of the sample error and the output error.
翻译:近期研究表明,再生核希尔伯特空间(RKHS)并不适合用于神经网络建模函数,因为即使在逼近单个ReLU神经元时也无法规避维度灾难(CoD)(Bach, 2017)。本文从样本复杂度与泛化性质的角度,研究了具有有界范数(如路径范数、Barron范数)的过参数化两层神经网络的适用函数空间。首先,我们证明路径范数(以及Barron范数)能够获得与宽度无关的样本复杂度界,从而实现一致收敛保障。基于该结果,我们通过凸包技术改进了度量熵的估计结果,实现ε-覆盖复杂度达到$\mathcal{O}(\epsilon^{-\frac{2d}{d+2}})$($d$为输入维度,依赖常数至多为$d$的多项式阶),这表明在Barron空间中学习目标函数时与核方法$\Omega(\epsilon^{-d})$的复杂度存在本质差异。其次,该度量熵结果允许在一般矩假设条件下建立更紧的泛化界,其收敛速率达到$\mathcal{O}(n^{-\frac{d+2}{2d+2}})$。本分析的创新之处在于:提供了更精确的度量熵估计(明确呈现与维度$d$的依赖关系),并在估计样本误差与输出误差时考虑了无界采样情形。