Understanding how convolutional neural networks (CNNs) can efficiently learn high-dimensional functions remains a fundamental challenge. A popular belief is that these models harness the local and hierarchical structure of natural data such as images. Yet, we lack a quantitative understanding of how such structure affects performance, e.g., the rate of decay of the generalisation error with the number of training samples. In this paper, we study infinitely-wide deep CNNs in the kernel regime. First, we show that the spectrum of the corresponding kernel inherits the hierarchical structure of the network, and we characterise its asymptotics. Then, we use this result together with generalisation bounds to prove that deep CNNs adapt to the spatial scale of the target function. In particular, we find that if the target function depends on low-dimensional subsets of adjacent input variables, then the decay of the error is controlled by the effective dimensionality of these subsets. Conversely, if the target function depends on the full set of input variables, then the error decay is controlled by the input dimension. We conclude by computing the generalisation error of a deep CNN trained on the output of another deep CNN with randomly-initialised parameters. Interestingly, we find that, despite their hierarchical structure, the functions generated by infinitely-wide deep CNNs are too rich to be efficiently learnable in high dimension.
翻译:理解卷积神经网络如何高效学习高维函数仍是一个基本挑战。普遍观点认为,此类模型利用了图像等自然数据的局部与层次结构。然而,我们缺乏对这种结构如何影响性能(例如泛化误差随训练样本数量的衰减速率)的定量理解。本文在核机制下研究了无限宽深度卷积神经网络。首先,我们证明相应核的谱继承了网络的层次结构,并刻画了其渐近性质。然后,我们利用该结果结合泛化界,证明深度卷积神经网络能适应目标函数的空间尺度。特别地,我们发现若目标函数依赖于相邻输入变量的低维子集,则误差衰减受这些子集的有效维度控制;相反,若目标函数依赖全部输入变量,则误差衰减受输入维度控制。最后,我们计算了在另一随机初始化参数的深度卷积神经网络输出上训练的深度卷积神经网络的泛化误差。有趣的是,我们发现尽管具有层次结构,但无限宽深度卷积神经网络生成的函数过于丰富,以致于在高维中无法被高效学习。