In this paper, we provide a strategy to determine the eigenvalue decay rate (EDR) of a large class of kernel functions defined on a general domain rather than $\mathbb S^{d}$. This class of kernel functions include but are not limited to the neural tangent kernel associated with neural networks with different depths and various activation functions. After proving that the dynamics of training the wide neural networks uniformly approximated that of the neural tangent kernel regression on general domains, we can further illustrate the minimax optimality of the wide neural network provided that the underground truth function $f\in [\mathcal H_{\mathrm{NTK}}]^{s}$, an interpolation space associated with the RKHS $\mathcal{H}_{\mathrm{NTK}}$ of NTK. We also showed that the overfitted neural network can not generalize well. We believe our approach for determining the EDR of kernels might be also of independent interests.
翻译:本文提供了一种策略,用于确定定义在一般域(而非 $\mathbb S^{d}$)上的大规模核函数的特征值衰减率。这类核函数包括但不限于与不同深度和多种激活函数的神经网络相关联的神经正切核。在证明宽神经网络训练动态在一般域上一致逼近神经正切核回归后,我们进一步阐明了当底层真实函数 $f\in [\mathcal H_{\mathrm{NTK}}]^{s}$(即与NTK的RKHS $\mathcal{H}_{\mathrm{NTK}}$ 相关联的插值空间)时,宽神经网络的极小极大最优性。我们还表明,过拟合神经网络无法实现良好的泛化。我们相信,本文用于确定核函数特征值衰减率的方法可能也具有独立的研究价值。