Learning generic high-dimensional tasks is notably hard, as it requires a number of training data exponential in the dimension. Yet, deep convolutional neural networks (CNNs) have shown remarkable success in overcoming this challenge. A popular hypothesis is that learnable tasks are highly structured and that CNNs leverage this structure to build a low-dimensional representation of the data. However, little is known about how much training data they require, and how this number depends on the data structure. This paper answers this question for a simple classification task that seeks to capture relevant aspects of real data: the Random Hierarchy Model. In this model, each of the $n_c$ classes corresponds to $m$ synonymic compositions of high-level features, which are in turn composed of sub-features through an iterative process repeated $L$ times. We find that the number of training data $P^*$ required by deep CNNs to learn this task (i) grows asymptotically as $n_c m^L$, which is only polynomial in the input dimensionality; (ii) coincides with the training set size such that the representation of a trained network becomes invariant to exchanges of synonyms; (iii) corresponds to the number of data at which the correlations between low-level features and classes become detectable. Overall, our results indicate how deep CNNs can overcome the curse of dimensionality by building invariant representations, and provide an estimate of the number of data required to learn a task based on its hierarchically compositional structure.
翻译:学习通用高维任务尤为困难,因为所需训练数据量随维度呈指数增长。然而,深度卷积神经网络(CNN)在克服这一挑战方面展现出显著成效。一种主流假设认为,可学习任务具有高度结构性,且CNN通过利用这种结构构建数据的低维表征。但鲜有研究明确回答两个问题:它们究竟需要多少训练数据?该数量如何依赖数据本身的结构?本文针对一个试图捕捉真实数据关键特征的简单分类任务——随机层级模型——给出解答。在该模型中,$n_c$个类别各自对应$m$个同义高阶特征组合,而这些高阶特征又通过迭代过程重复$L$次由子特征逐级组合而成。我们发现,深度CNN学习此任务所需训练数据量$P^*$满足:(i)渐近增长量为$n_c m^L$,仅与输入维度呈多项式关系;(ii)与使训练网络表征对同义词替换保持不变的训练集规模一致;(iii)对应低阶特征与类别间相关性可被检测时的数据量阈值。整体而言,我们的研究揭示了深度CNN如何通过构建不变表征克服维数灾难,并基于层级组合结构为任务所需数据量提供了理论估算依据。