It is shown that over-parameterized neural networks can achieve minimax optimal rates of convergence (up to logarithmic factors) for learning functions from certain smooth function classes, if the weights are suitably constrained or regularized. Specifically, we consider the nonparametric regression of estimating an unknown $d$-variate function by using shallow ReLU neural networks. It is assumed that the regression function is from the H\"older space with smoothness $\alpha<(d+3)/2$ or a variation space corresponding to shallow neural networks, which can be viewed as an infinitely wide neural network. In this setting, we prove that least squares estimators based on shallow neural networks with certain norm constraints on the weights are minimax optimal, if the network width is sufficiently large. As a byproduct, we derive a new size-independent bound for the local Rademacher complexity of shallow ReLU neural networks, which may be of independent interest.
翻译:研究表明,当权重受到适当约束或正则化时,过参数化神经网络能够在学习特定光滑函数类中的函数时达到极小极大最优收敛速率(对数因子除外)。具体而言,我们考虑利用浅层ReLU神经网络估计未知$d$元函数的非参数回归问题。假定回归函数来自光滑度为$\alpha<(d+3)/2$的Hölder空间或对应于浅层神经网络的变分空间(可视为无限宽神经网络)。在此设定下,我们证明:若网络宽度足够大,则基于权重具有特定范数约束的浅层神经网络的最小二乘估计量是极小极大最优的。作为副产品,我们推导出浅层ReLU神经网络局部Rademacher复杂度的一个与规模无关的新上界,该结果或具有独立研究价值。