With the growth of deep neural networks (DNN), the number of DNN parameters has drastically increased. This makes DNN models hard to be deployed on resource-limited embedded systems. To alleviate this problem, dynamic pruning methods have emerged, which try to find diverse sparsity patterns during training by utilizing Straight-Through-Estimator (STE) to approximate gradients of pruned weights. STE can help the pruned weights revive in the process of finding dynamic sparsity patterns. However, using these coarse gradients causes training instability and performance degradation owing to the unreliable gradient signal of the STE approximation. In this work, to tackle this issue, we introduce refined gradients to update the pruned weights by forming dual forwarding paths from two sets (pruned and unpruned) of weights. We propose a novel Dynamic Collective Intelligence Learning (DCIL) which makes use of the learning synergy between the collective intelligence of both weight sets. We verify the usefulness of the refined gradients by showing enhancements in the training stability and the model performance on the CIFAR and ImageNet datasets. DCIL outperforms various previously proposed pruning schemes including other dynamic pruning methods with enhanced stability during training.
翻译:随着深度神经网络(DNN)的发展,其参数数量急剧增加,这使得DNN模型难以部署在资源受限的嵌入式系统上。为解决这一问题,动态剪枝方法应运而生,它们通过在训练过程中利用直通估计器(STE)近似被剪枝权重的梯度,来寻找多样化的稀疏模式。STE有助于在寻找动态稀疏模式的过程中使被剪枝权重恢复活力。然而,使用这些粗粒度梯度会导致训练不稳定和性能下降,原因在于STE近似产生的梯度信号不可靠。针对此问题,本文通过从两个权重集(被剪枝集和未剪枝集)构建双前向路径,引入细化梯度来更新被剪枝权重。我们提出了一种新颖的动态集体智能学习(DCIL)方法,该方法利用两个权重集集体智能之间的学习协同效应。通过在CIFAR和ImageNet数据集上展示训练稳定性和模型性能的提升,我们验证了细化梯度的有效性。DCIL在训练稳定性方面优于多种先前提出的剪枝方案(包括其他动态剪枝方法)。