The training of neural networks requires tedious and often manual tuning of the network architecture. We propose a systematic method to insert new layers during the training process, which eliminates the need to choose a fixed network size before training. Our technique borrows techniques from constrained optimization and is based on first-order sensitivity information of the objective with respect to the virtual parameters that additional layers, if inserted, would offer. We consider fully connected feedforward networks with selected activation functions as well as residual neural networks. In numerical experiments, the proposed sensitivity-based layer insertion technique exhibits improved training decay, compared to not inserting the layer. Furthermore, the computational effort is reduced in comparison to inserting the layer from the beginning. The code is available at \url{https://github.com/LeonieKreis/layer_insertion_sensitivity_based}.
翻译:神经网络训练通常需要繁琐且常依赖人工调整的网络架构设计。本文提出一种在训练过程中系统性地插入新层的方法,从而无需在训练前预先确定固定网络规模。该技术借鉴约束优化方法,基于目标函数关于虚拟参数的一阶灵敏度信息——这些参数对应的是若插入新层后可能引入的自由度。我们分别针对采用选定激活函数的全连接前馈网络和残差神经网络进行了研究。数值实验表明,与不插入层相比,所提出的基于灵敏度的层插入技术能有效改善训练收敛过程。此外,相较于从训练初始就插入该层的方法,本方法可显著降低计算开销。相关代码已开源在:\url{https://github.com/LeonieKreis/layer_insertion_sensitivity_based}。