In this paper, we prove that in the overparametrized regime, deep neural network provide universal approximations and can interpolate any data set, as long as the activation function is locally in $L^1(\RR)$ and not an affine function. Additionally, if the activation function is smooth and such an interpolation networks exists, then the set of parameters which interpolate forms a manifold. Furthermore, we give a characterization of the Hessian of the loss function evaluated at the interpolation points. In the last section, we provide a practical probabilistic method of finding such a point under general conditions on the activation function.
翻译:本文证明,在过参数化条件下,只要激活函数局部属于 $L^1(\RR)$ 且不是仿射函数,深度神经网络就能实现通用逼近,并可对任意数据集进行插值。此外,若激活函数光滑且存在此类插值网络,则实现插值的参数集构成一个流形。进一步,我们给出了在插值点处评估的损失函数Hessian矩阵的特征刻画。最后一节,我们在激活函数的一般条件下,提出了一种寻找此类插值点的实用概率方法。