We study depth separation in infinite-width neural networks, where complexity is controlled by the overall squared $\ell_2$-norm of the weights (sum of squares of all weights in the network). Whereas previous depth separation results focused on separation in terms of width, such results do not give insight into whether depth determines if it is possible to learn a network that generalizes well even when the network width is unbounded. Here, we study separation in terms of the sample complexity required for learnability. Specifically, we show that there are functions that are learnable with sample complexity polynomial in the input dimension by norm-controlled depth-3 ReLU networks, yet are not learnable with sub-exponential sample complexity by norm-controlled depth-2 ReLU networks (with any value for the norm). We also show that a similar statement in the reverse direction is not possible: any function learnable with polynomial sample complexity by a norm-controlled depth-2 ReLU network with infinite width is also learnable with polynomial sample complexity by a norm-controlled depth-3 ReLU network.
翻译:我们研究了无限宽度神经网络中的深度分离问题,其中网络复杂度由所有权重的整体平方\(\ell_2\)范数(即网络中所有权重的平方和)控制。以往关于深度分离的研究主要关注宽度维度的分离,但这类结果无法揭示在网络宽度无界时,深度是否决定能否学习到一个泛化能力良好的网络。本文从可学习性所需的样本复杂度角度研究分离问题。具体而言,我们证明存在这样一类函数:它们可由范数有界的深度3 ReLU网络通过多项式样本复杂度在输入维度上学习,但无法由范数有界的深度2 ReLU网络(无论采用何种范数值)在亚指数样本复杂度下学习。我们还表明,反向情形不可能成立:任何可由范数有界无限宽度深度2 ReLU网络通过多项式样本复杂度学习的函数,同样可由范数有界深度3 ReLU网络通过多项式样本复杂度学习。