In this paper, we provide a strategy to determine the eigenvalue decay rate (EDR) of a large class of kernel functions defined on a general domain rather than $\mathbb S^{d}$. This class of kernel functions include but are not limited to the neural tangent kernel associated with neural networks with different depths and various activation functions. After proving that the dynamics of training the wide neural networks uniformly approximated that of the neural tangent kernel regression on general domains, we can further illustrate the minimax optimality of the wide neural network provided that the underground truth function $f\in [\mathcal H_{\mathrm{NTK}}]^{s}$, an interpolation space associated with the RKHS $\mathcal{H}_{\mathrm{NTK}}$ of NTK. We also showed that the overfitted neural network can not generalize well. We believe our approach for determining the EDR of kernels might be also of independent interests.
翻译:本文提出了一种策略,用于确定定义在一般区域(而非 $\mathbb{S}^{d}$)上的一类大规模核函数的特征值衰减率(EDR)。该类核函数包括但不限于与不同深度及多种激活函数的神经网络相关联的神经正切核。在证明宽神经网络训练的动力学在一般区域上一致逼近神经正切核回归的动力学之后,我们进一步揭示了:若地下真实函数 $f\in [\mathcal H_{\mathrm{NTK}}]^{s}$(即与神经正切核的再生核希尔伯特空间 $\mathcal{H}_{\mathrm{NTK}}$ 相关的插值空间),则宽神经网络具有极小极大最优性。我们还证明了过拟合的神经网络无法实现良好泛化。我们相信,本文提出的确定核函数特征值衰减率的方法可能具有独立的学术价值。