In this paper, we provide a geometric interpretation of the structure of Deep Learning (DL) networks, characterized by $L$ hidden layers, a ramp activation function, an ${\mathcal L}^2$ Schatten class (or Hilbert-Schmidt) cost function, and input and output spaces ${\mathbb R}^Q$ with equal dimension $Q\geq1$. The hidden layers are defined on spaces ${\mathbb R}^{Q}$, as well. We apply our recent results on shallow neural networks to construct an explicit family of minimizers for the global minimum of the cost function in the case $L\geq Q$, which we show to be degenerate. In the context presented here, the hidden layers of the DL network "curate" the training inputs by recursive application of a truncation map that minimizes the noise to signal ratio of the training inputs. Moreover, we determine a set of $2^Q-1$ distinct degenerate local minima of the cost function.
翻译:本文对深度学习网络的结构给出了几何解释,该网络包含$L$个隐藏层、斜坡激活函数、${\mathcal L}^2$ Schatten类(或Hilbert-Schmidt)代价函数,以及输入和输出空间${\mathbb R}^Q$(维度$Q\geq1$)。隐藏层亦定义在空间${\mathbb R}^{Q}$上。我们应用近期关于浅层神经网络的研究成果,针对$L\geq Q$的情形,构造了一族显式的代价函数全局极小化子,并证明其为退化情形。在本文所讨论的框架中,深度学习网络的隐藏层通过递归应用截断映射来"整理"训练输入,该映射可最小化训练输入的信噪比。此外,我们确定了代价函数的$2^Q-1$个不同的退化局部极小值点。