We propose a hierarchical training algorithm for standard feed-forward neural networks that adaptively extends the network architecture as soon as the optimization reaches a stationary point. By solving small (low-dimensional) optimization problems, the extended network provably escapes any local minimum or stationary point. Under some assumptions on the approximability of the data with stable neural networks, we show that the algorithm achieves an optimal convergence rate s in the sense that loss is bounded by the number of parameters to the -s. As a byproduct, we obtain computable indicators which judge the optimality of the training state of a given network and derive a new notion of generalization error.
翻译:我们提出了一种适用于标准前馈神经网络的分层训练算法,该算法在优化过程达到平稳点时自适应地扩展网络结构。通过求解小型(低维)优化问题,扩展后的网络可证明地逃离任何局部极小值或平稳点。在数据可由稳定神经网络逼近的某些假设下,我们证明该算法实现了最优收敛速率s,即损失以参数数量的-s次幂为界。作为推论,我们获得了可计算的指标来评判给定网络训练状态的最优性,并推导出一种新的泛化误差概念。