Learning generic high-dimensional tasks is notably hard, as it requires a number of training data exponential in the dimension. Yet, deep convolutional neural networks (CNNs) have shown remarkable success in overcoming this challenge. A popular hypothesis is that learnable tasks are highly structured and that CNNs leverage this structure to build a low-dimensional representation of the data. However, little is known about how much training data they require, and how this number depends on the data structure. This paper answers this question for a simple classification task that seeks to capture relevant aspects of real data: the Random Hierarchy Model. In this model, each of the $n_c$ classes corresponds to $m$ synonymic compositions of high-level features, which are in turn composed of sub-features through an iterative process repeated $L$ times. We find that the number of training data $P^*$ required by deep CNNs to learn this task (i) grows asymptotically as $n_c m^L$, which is only polynomial in the input dimensionality; (ii) coincides with the training set size such that the representation of a trained network becomes invariant to exchanges of synonyms; (iii) corresponds to the number of data at which the correlations between low-level features and classes become detectable. Overall, our results indicate how deep CNNs can overcome the curse of dimensionality by building invariant representations, and provide an estimate of the number of data required to learn a task based on its hierarchically compositional structure.
翻译:学习通用高维任务十分困难,因为需要与维度成指数增长数量的训练数据。然而,深度卷积神经网络(CNN)在克服这一挑战方面展示了卓越的成功。一个流行的假设是,可学习的任务具有高度结构性,而CNN利用这种结构构建数据的低维表示。但关于它们需要多少训练数据,以及这个数量如何依赖于数据结构,目前所知甚少。本文针对一个旨在捕捉真实数据相关方面的简单分类任务——随机层次模型——回答了这个问题。在该模型中,每个 $n_c$ 类别对应 $m$ 个同义的高层特征组合,这些高层特征又通过迭代过程重复 $L$ 次由子特征组合而成。我们发现,深度CNN学习此任务所需的训练数据数量 $P^*$ (i)渐近地增长为 $n_c m^L$,仅与输入维度成多项式关系;(ii)与训练集大小重合,使得训练后的网络表示对同义词交换具有不变性;(iii)对应着低层特征与类别之间相关性变得可检测的数据量。总体而言,我们的结果揭示了深度CNN如何通过构建不变表示克服维度灾难,并根据任务的层次组合结构提供了学习该任务所需数据量的估计。