In this paper, we consider the generalization ability of deep wide feedforward ReLU neural networks defined on a bounded domain $\mathcal X \subset \mathbb R^{d}$. We first demonstrate that the generalization ability of the neural network can be fully characterized by that of the corresponding deep neural tangent kernel (NTK) regression. We then investigate on the spectral properties of the deep NTK and show that the deep NTK is positive definite on $\mathcal{X}$ and its eigenvalue decay rate is $(d+1)/d$. Thanks to the well established theories in kernel regression, we then conclude that multilayer wide neural networks trained by gradient descent with proper early stopping achieve the minimax rate, provided that the regression function lies in the reproducing kernel Hilbert space (RKHS) associated with the corresponding NTK. Finally, we illustrate that the overfitted multilayer wide neural networks can not generalize well on $\mathbb S^{d}$. We believe our technical contributions in determining the eigenvalue decay rate of NTK on $\mathbb R^{d}$ might be of independent interests.
翻译:本文研究了定义在有界域$\mathcal X \subset \mathbb R^{d}$上的深度宽前馈ReLU神经网络的泛化能力。我们首先证明,神经网络的泛化能力可由相应的深度神经正切核(NTK)回归完全刻画。随后,我们探讨了深度NTK的谱性质,表明深度NTK在$\mathcal{X}$上正定,且其特征值衰减率为$(d+1)/d$。借助核回归中已建立的理论,我们得出结论:当回归函数位于与对应NTK相关联的再生核希尔伯特空间(RKHS)中时,通过梯度下降训练并采用适当早停的多层宽神经网络能够达到极小化最优速率。最后,我们说明过拟合的多层宽神经网络在$\mathbb S^{d}$上无法良好泛化。我们相信,关于$\mathbb R^{d}$上NTK特征值衰减率的技术贡献可能具有独立研究价值。