In this paper, we provide a geometric interpretation of the structure of Deep Learning (DL) networks, characterized by $L$ hidden layers, a ramp activation function, an ${\mathcal L}^2$ Schatten class (or Hilbert-Schmidt) cost function, and input and output spaces ${\mathbb R}^Q$ with equal dimension $Q\geq1$. The hidden layers are defined on spaces ${\mathbb R}^{Q}$, as well. We apply our recent results on shallow neural networks to construct an explicit family of minimizers for the global minimum of the cost function in the case $L\geq Q$, which we show to be degenerate. In the context presented here, the hidden layers of the DL network "curate" the training inputs by recursive application of a truncation map that minimizes the noise to signal ratio of the training inputs. Moreover, we determine a set of $2^Q-1$ distinct degenerate local minima of the cost function.
翻译:本文对深度学习(DL)网络的结构提供了几何解释。该网络包含$L$个隐藏层、斜坡激活函数、${\mathcal L}^2$ Schatten类(或Hilbert-Schmidt)代价函数,以及输入和输出空间${\mathbb R}^Q$(维度$Q\geq1$)。隐藏层也定义在空间${\mathbb R}^{Q}$上。我们利用近期关于浅层神经网络的研究成果,在$L\geq Q$情形下构造了一族显式极小化子,以获取代价函数的全局最小值,并证明这些极小化子具有退化性。在所讨论的框架中,深度网络的隐藏层通过迭代应用截断映射来"管理"训练输入,该映射能最小化训练输入的噪声信号比。此外,我们确定了代价函数的一组$2^Q-1$个互异的退化局部极小值点。