We study the stability of accuracy during the training of deep neural networks (DNNs). In this context, the training of a DNN is performed via the minimization of a cross-entropy loss function, and the performance metric is accuracy (the proportion of objects that are classified correctly). While training results in a decrease of loss, the accuracy does not necessarily increase during the process and may sometimes even decrease. The goal of achieving stability of accuracy is to ensure that if accuracy is high at some initial time, it remains high throughout training. A recent result by Berlyand, Jabin, and Safsten introduces a doubling condition on the training data, which ensures the stability of accuracy during training for DNNs using the absolute value activation function. For training data in $\mathbb{R}^n$, this doubling condition is formulated using slabs in $\mathbb{R}^n$ and depends on the choice of the slabs. The goal of this paper is twofold. First, to make the doubling condition uniform, that is, independent of the choice of slabs. This leads to sufficient conditions for stability in terms of training data only. In other words, for a training set $T$ that satisfies the uniform doubling condition, there exists a family of DNNs such that a DNN from this family with high accuracy on the training set at some training time $t_0$ will have high accuracy for all time $t>t_0$. Moreover, establishing uniformity is necessary for the numerical implementation of the doubling condition. The second goal is to extend the original stability results from the absolute value activation function to a broader class of piecewise linear activation functions with finitely many critical points, such as the popular Leaky ReLU.
翻译:摘要:本文研究深度神经网络(DNNs)训练过程中的精度稳定性问题。在此背景下,DNNs的训练通过最小化交叉熵损失函数实现,性能指标为精度(正确分类对象的比例)。虽然训练过程会使损失降低,但精度不一定随之提升,甚至可能下降。实现精度稳定性的目标是确保:若初始时刻模型具有高精度,则在整个训练过程中保持高精度。Berlyand、Jabin和Safsten近期提出一种针对训练数据的加倍条件,该条件确保使用绝对值激活函数的DNNs在训练过程中保持精度稳定性。针对$\mathbb{R}^n$中的训练数据,该加倍条件利用$\mathbb{R}^n$中的板形区域进行构造,并依赖于板形区域的选择。本文目标有二:其一,使加倍条件具有均匀性,即独立于板形区域的选择,从而建立仅基于训练数据本身的结构性稳定性充分条件。换言之,对于满足均匀加倍条件的训练集$T$,存在一族DNNs,使得该族中在$t_0$时刻对训练集具有高精度的DNN,对所有$t>t_0$时刻仍保持高精度。此外,建立均匀性对于加倍条件的数值实现至关重要。其二,将原始稳定性结果从绝对值激活函数推广至更广泛的有限临界点分段线性激活函数类,例如流行的Leaky ReLU。