In this paper, we approach the problem of cost (loss) minimization in underparametrized shallow neural networks through the explicit construction of upper bounds, without any use of gradient descent. A key focus is on elucidating the geometric structure of approximate and precise minimizers. We consider shallow neural networks with one hidden layer, a ReLU activation function, an ${\mathcal L}^2$ Schatten class (or Hilbert-Schmidt) cost function, input space ${\mathbb R}^M$, output space ${\mathbb R}^Q$ with $Q\leq M$, and training input sample size $N>QM$ that can be arbitrarily large. We prove an upper bound on the minimum of the cost function of order $O(\delta_P)$ where $\delta_P$ measures the signal to noise ratio of training inputs. In the special case $M=Q$, we explicitly determine an exact degenerate local minimum of the cost function, and show that the sharp value differs from the upper bound obtained for $Q\leq M$ by a relative error $O(\delta_P^2)$. The proof of the upper bound yields a constructively trained network; we show that it metrizes a particular $Q$-dimensional subspace in the input space ${\mathbb R}^M$. We comment on the characterization of the global minimum of the cost function in the given context.
翻译:本文通过显式构造上界的方法,在不借助任何梯度下降算法的前提下,探讨了欠参数化浅层神经网络中的成本(损失)最小化问题。研究的核心在于阐明近似最优解与精确最优解的几何结构。我们考虑具有单隐藏层、ReLU激活函数、${\mathcal L}^2$ Schatten类(或希尔伯特-施密特型)成本函数、输入空间${\mathbb R}^M$、输出空间${\mathbb R}^Q$(其中$Q\leq M$)且训练输入样本量$N>QM$可任意大的浅层神经网络。我们证明了成本函数最小值的一个上界,其阶数为$O(\delta_P)$,其中$\delta_P$衡量训练输入的信号噪声比。在特殊情形$M=Q$下,我们显式确定了一个精确的退化局部最小值,并证明该精确值与针对$Q\leq M$情形所得上界之间的相对误差为$O(\delta_P^2)$。该上界的证明可构造出一个经过训练的网络;我们证明了该网络度量化了输入空间${\mathbb R}^M$中的一个特定$Q$维子空间。最后,我们评述了在给定背景下成本函数全局最小值的刻画问题。