In this work, we instantiate a regularized form of the gradient clipping algorithm and prove that it can converge to the global minima of deep neural network loss functions provided that the net is of sufficient width. We present empirical evidence that our theoretically founded regularized gradient clipping algorithm is also competitive with the state-of-the-art deep-learning heuristics. Hence the algorithm presented here constitutes a new approach to rigorous deep learning. The modification we do to standard gradient clipping is designed to leverage the PL* condition, a variant of the Polyak-Lojasiewicz inequality which was recently proven to be true for various neural networks for any depth within a neighborhood of the initialisation.
翻译:本文实例化了一种梯度裁剪算法的正则化形式,并证明当神经网络具有足够宽度时,该算法能收敛至深度神经网络损失函数的全局最小值。我们通过实验证据表明,这种具有理论依据的正则化梯度裁剪算法与当前最优的深度学习启发式方法具有同等竞争力。因此,本文提出的算法为严格深度学习提供了新途径。我们对标准梯度裁剪的修改旨在利用PL*条件——该条件是Polyak-Lojasiewicz不等式的一种变体,近期已被证明在初始化邻域内对任意深度的多种神经网络均成立。