Many mathematical convergence results for gradient descent (GD) based algorithms employ the assumption that the GD process is (almost surely) bounded and, also in concrete numerical simulations, divergence of the GD process may slow down, or even completely rule out, convergence of the error function. In practical relevant learning problems, it thus seems to be advisable to design the ANN architectures in a way so that GD optimization processes remain bounded. The property of the boundedness of GD processes for a given learning problem seems, however, to be closely related to the existence of minimizers in the optimization landscape and, in particular, GD trajectories may escape to infinity if the infimum of the error function (objective function) is not attained in the optimization landscape. This naturally raises the question of the existence of minimizers in the optimization landscape and, in the situation of shallow residual ANNs with multi-dimensional input layers and multi-dimensional hidden layers with the ReLU activation, the main result of this work answers this question affirmatively for a general class of loss functions and all continuous target functions. In our proof of this statement, we propose a kind of closure of the search space, where the limits are called generalized responses, and, thereafter, we provide sufficient criteria for the loss function and the underlying probability distribution which ensure that all additional artificial generalized responses are suboptimal which finally allows us to conclude the existence of minimizers in the optimization landscape.
翻译:许多基于梯度下降(GD)算法的数学收敛结果都假设GD过程(几乎必然)有界,同时在实际数值模拟中,GD过程的发散可能会减缓甚至完全阻止误差函数的收敛。在实际相关的学习问题中,因此建议以GD优化过程保持有界的方式设计人工神经网络架构似乎是明智的。给定学习问题下GD过程的有界性这一性质,似乎与优化景观中极小值的存在性密切相关;特别是,如果误差函数(目标函数)的下确界在优化景观中未被达到,GD轨迹可能会逃逸至无穷远。这自然地引发了优化景观中极小值存在性的问题,而在具有多维输入层和多维隐藏层(使用ReLU激活函数)的浅层残差人工神经网络情形下,本文的主要结果针对一般损失函数类别和所有连续目标函数对此问题给出了肯定回答。在我们对该结论的证明中,我们提出了一种搜索空间的闭包,其极限被称为广义响应,随后,我们提供了损失函数和底层概率分布的充分条件,确保所有额外的人工广义响应均为次优,从而最终能够推断优化景观中极小值的存在性。