Understanding how convolutional neural networks (CNNs) can efficiently learn high-dimensional functions remains a fundamental challenge. A popular belief is that these models harness the local and hierarchical structure of natural data such as images. Yet, we lack a quantitative understanding of how such structure affects performance, e.g. the rate of decay of the generalisation error with the number of training samples. In this paper, we study deep CNNs in the kernel regime. First, we show that the spectrum of the corresponding kernel inherits the hierarchical structure of the network, and we characterise its asymptotics. Then, we use this result together with generalisation bounds to prove that deep CNNs adapt to the spatial scale of the target function. In particular, we find that if the target function depends on low-dimensional subsets of adjacent input variables, then the rate of decay of the error is controlled by the effective dimensionality of these subsets. Conversely, if the target function depends on the full set of input variables, then the error rate is inversely proportional to the input dimension. We conclude by computing the rate when a deep CNN is trained on the output of another deep CNN with randomly-initialised parameters. Interestingly, we find that, despite their hierarchical structure, the functions generated by deep CNNs are too rich to be efficiently learnable in high dimension.
翻译:理解卷积神经网络(CNN)如何高效学习高维函数仍是一个基本挑战。普遍观点认为,这类模型利用了自然数据(如图像)的局部与层次结构。然而,我们仍缺乏对这种结构如何影响性能(例如,泛化误差随训练样本数量的衰减速率)的定量理解。本文在核机制下研究深度卷积神经网络。首先,我们证明相应核的谱继承了网络的层次结构,并刻画其渐近特性。随后,利用该结果与泛化界,我们证明深度卷积神经网络能适应目标函数的空间尺度。特别地,我们发现:若目标函数依赖于相邻输入变量的低维子集,则误差衰减速率由这些子集的有效维数控制;反之,若目标函数依赖于全输入变量集,则误差率与输入维数成反比。最后,我们计算了训练深度CNN以拟合另一随机初始化参数深度CNN输出时的误差速率。有趣的是,我们发现尽管具有层次结构,深度CNN生成的函数过于丰富,在高维场景下无法被高效学习。