We show that deep neural networks (DNNs) can efficiently learn any composition of functions with bounded $F_{1}$-norm, which allows DNNs to break the curse of dimensionality in ways that shallow networks cannot. More specifically, we derive a generalization bound that combines a covering number argument for compositionality, and the $F_{1}$-norm (or the related Barron norm) for large width adaptivity. We show that the global minimizer of the regularized loss of DNNs can fit for example the composition of two functions $f^{*}=h\circ g$ from a small number of observations, assuming $g$ is smooth/regular and reduces the dimensionality (e.g. $g$ could be the modulo map of the symmetries of $f^{*}$), so that $h$ can be learned in spite of its low regularity. The measures of regularity we consider is the Sobolev norm with different levels of differentiability, which is well adapted to the $F_{1}$ norm. We compute scaling laws empirically and observe phase transitions depending on whether $g$ or $h$ is harder to learn, as predicted by our theory.
翻译:我们证明了深度神经网络(DNNs)能够高效地学习任何具有有界$F_{1}$-范数的函数组合,这使得DNNs能够以浅层网络无法实现的方式打破维度诅咒。具体而言,我们推导了一个泛化界,它结合了针对组合性的覆盖数论证,以及针对大宽度自适应性的$F_{1}$-范数(或相关的Barron范数)。我们证明了DNNs正则化损失的全局最小化器能够从少量观测中拟合例如两个函数的组合$f^{*}=h\circ g$,前提是$g$是光滑/正则的且能降低维度(例如,$g$可以是$f^{*}$对称性的模映射),从而使得$h$尽管正则性较低也能被学习。我们所考虑的正则性度量是具有不同可微性水平的Sobolev范数,这与$F_{1}$范数良好适配。我们通过实验计算了标度律,并观察到取决于$g$或$h$哪个更难学习的相变,这与我们的理论预测一致。