Training a neural network (NN) typically relies on some type of curve-following method, such as gradient descent (GD) (and stochastic gradient descent (SGD)), ADADELTA, ADAM or limited memory algorithms. Convergence for these algorithms usually relies on having access to a large quantity of observations in order to achieve a high level of accuracy and, with certain classes of functions, these algorithms could take multiple epochs of data points to catch on. Herein, a different technique with the potential of achieving dramatically better speeds of convergence, especially for shallow networks, is explored: it does not curve-follow but rather relies on 'decoupling' hidden layers and on updating their weighted connections through bootstrapping, resampling and linear regression. By utilizing resampled observations, the convergence of this process is empirically shown to be remarkably fast and to require a lower amount of data points: in particular, our experiments show that one needs a fraction of the observations that are required with traditional neural network training methods to approximate various classes of functions.
翻译:训练神经网络(NN)通常依赖于某种曲线跟踪方法,例如梯度下降(GD)及其随机变体(SGD)、ADADELTA、ADAM或有限内存算法。这些算法的收敛通常依赖于大量观测数据以实现高精度,并且对于某些函数类型,可能需要多个轮次的数据点才能收敛。本文探索了一种不同的技术,该技术有望实现显著更快的收敛速度,尤其适用于浅层网络:它不采用曲线跟踪方法,而是通过“解耦”隐藏层并利用引导法、重采样和线性回归来更新其加权连接。通过使用重采样观测数据,该过程的收敛速度被经验证明非常快,且所需数据点数量更少:特别是,我们的实验表明,与传统神经网络训练方法相比,仅需少量观测数据即可逼近各类函数。