In this paper, we explicitly determine local and global minimizers of the $\mathcal{L}^2$ cost function in underparametrized Deep Learning (DL) networks; our main goal is to shed light on their geometric structure and properties. We accomplish this by a direct construction, without invoking the gradient descent flow at any point of this work. We specifically consider $L$ hidden layers, a ReLU ramp activation function, an $\mathcal{L}^2$ Schatten class (or Hilbert-Schmidt) cost function, input and output spaces $\mathbb{R}^Q$ with equal dimension $Q\geq1$, and hidden layers also defined on $\mathbb{R}^{Q}$; the training inputs are assumed to be sufficiently clustered. The training input size $N$ can be arbitrarily large - thus, we are considering the underparametrized regime. More general settings are left to future work. We construct an explicit family of minimizers for the global minimum of the cost function in the case $L\geq Q$, which we show to be degenerate. Moreover, we determine a set of $2^Q-1$ distinct degenerate local minima of the cost function. In the context presented here, the concatenation of hidden layers of the DL network is reinterpreted as a recursive application of a {\em truncation map} which "curates" the training inputs by minimizing their noise to signal ratio.
翻译:本文明确确定了欠参数化深度学习网络中${\mathcal L}^2$代价函数的局部与全局极小元;主要目标是揭示其几何结构与性质。我们通过直接构造实现这一目标,在本文任何部分均未涉及梯度下降流。具体考虑$L$个隐藏层、ReLU斜坡激活函数、${\mathcal L}^2$ Schatten类(或Hilbert-Schmidt)代价函数、输入输出空间$\mathbb{R}^Q$(维数$Q\geq1$),且隐藏层同样定义在$\mathbb{R}^{Q}$上;训练输入假设具有充分聚类性。训练输入规模$N$可任意大——因此我们考虑的是欠参数化情形。更一般的设置留待未来工作。对于$L\geq Q$的情况,我们构造了代价函数全局极小元的显式族,并证明其具有退化性。此外,我们确定了代价函数的$2^Q-1$个不同的退化局部极小元集。在本文框架下,深度学习网络隐藏层的级联被重新诠释为一种递归应用“截断映射”的过程,该映射通过最小化训练输入的信噪比对其实现“调理”。