We develop an approach to efficiently grow neural networks, within which parameterization and optimization strategies are designed by considering their effects on the training dynamics. Unlike existing growing methods, which follow simple replication heuristics or utilize auxiliary gradient-based local optimization, we craft a parameterization scheme which dynamically stabilizes weight, activation, and gradient scaling as the architecture evolves, and maintains the inference functionality of the network. To address the optimization difficulty resulting from imbalanced training effort distributed to subnetworks fading in at different growth phases, we propose a learning rate adaption mechanism that rebalances the gradient contribution of these separate subcomponents. Experimental results show that our method achieves comparable or better accuracy than training large fixed-size models, while saving a substantial portion of the original computation budget for training. We demonstrate that these gains translate into real wall-clock training speedups.
翻译:我们提出了一种高效生长神经网络的方法,其中参数化与优化策略的设计充分考虑了其对训练动力学的影响。与现有遵循简单复制启发式方法或利用辅助梯度局部优化的生长方法不同,我们设计了一套参数化方案,在架构演变过程中动态稳定权重、激活值与梯度的尺度,同时保持网络的推理功能。针对不同生长阶段逐步嵌入的子网络因训练负载不均衡导致的优化困难,我们提出了一种学习率自适应机制,用以重新平衡这些子组件的梯度贡献。实验结果表明,我们的方法在达到与训练大规模固定尺寸模型相当或更优精度的同时,可节省大量原始训练计算预算。我们证明这些增益可转化为实际的挂钟时间加速效果。