Understanding what makes high-dimensional data learnable is a fundamental question in machine learning. On the one hand, it is believed that the success of deep learning lies in its ability to build a hierarchy of representations that become increasingly more abstract with depth, going from simple features like edges to more complex concepts. On the other hand, learning to be insensitive to invariances of the task, such as smooth transformations for image datasets, has been argued to be important for deep networks and it strongly correlates with their performance. In this work, we aim to explain this correlation and unify these two viewpoints. We show that by introducing sparsity to generative hierarchical models of data, the task acquires insensitivity to spatial transformations that are discrete versions of smooth transformations. In particular, we introduce the Sparse Random Hierarchy Model (SRHM), where we observe and rationalize that a hierarchical representation mirroring the hierarchical model is learnt precisely when such insensitivity is learnt, thereby explaining the strong correlation between the latter and performance. Moreover, we quantify how the sample complexity of CNNs learning the SRHM depends on both the sparsity and hierarchical structure of the task.
翻译:理解高维数据的可学习性是机器学习中的一个基本问题。一方面,人们认为深度学习的成功在于其能够构建随深度增加而日益抽象的层次化表示,从边缘等简单特征发展到更复杂的概念。另一方面,学习对任务的不变性(例如图像数据集中的平滑变换)不敏感,被认为对深度网络至关重要,并且与其性能密切相关。在本研究中,我们旨在解释这种相关性并统一这两种观点。我们证明,通过在生成式层次化数据模型中引入稀疏性,任务会获得对离散化平滑变换的空间变换的不敏感性。具体而言,我们引入了稀疏随机层次模型(SRHM),在该模型中,我们观察并合理化了一个事实:当学习到这种不敏感性时,正好会学习到与层次化模型镜像对应的层次化表示,从而解释了后者与性能之间的强相关性。此外,我们量化了学习SRHM的CNN的样本复杂度如何依赖于任务的稀疏性和层次结构。