When training deep neural networks, the phenomenon of $\textit{dying neurons}$ $\unicode{x2013}$units that become inactive or saturated, output zero during training$\unicode{x2013}$ has traditionally been viewed as undesirable, linked with optimization challenges, and contributing to plasticity loss in continual learning scenarios. In this paper, we reassess this phenomenon, focusing on sparsity and pruning. By systematically exploring the impact of various hyperparameter configurations on dying neurons, we unveil their potential to facilitate simple yet effective structured pruning algorithms. We introduce $\textit{Demon Pruning}$ (DemP), a method that controls the proliferation of dead neurons, dynamically leading to network sparsity. Achieved through a combination of noise injection on active units and a one-cycled schedule regularization strategy, DemP stands out for its simplicity and broad applicability. Experiments on CIFAR10 and ImageNet datasets demonstrate that DemP surpasses existing structured pruning techniques, showcasing superior accuracy-sparsity tradeoffs and training speedups. These findings suggest a novel perspective on dying neurons as a valuable resource for efficient model compression and optimization.
翻译:在深度神经网络的训练过程中,“死亡神经元”现象——即那些变得不活跃或饱和、在训练期间输出为零的单元——传统上被视为不受欢迎的,与优化难题相关,并在持续学习场景中导致可塑性丧失。本文重新评估了这一现象,重点关注稀疏性和剪枝。通过系统探索各种超参数配置对死亡神经元的影响,我们揭示了这些神经元在促进简单而有效的结构化剪枝算法方面的潜力。我们提出了“恶魔剪枝”(Demon Pruning, DemP)方法,该方法通过控制死亡神经元的增殖,动态地实现网络稀疏性。DemP结合了对活跃单元的噪声注入和单周期调度正则化策略,以其简单性和广泛适用性脱颖而出。在CIFAR10和ImageNet数据集上的实验表明,DemP超越了现有的结构化剪枝技术,展现出优越的精度-稀疏性权衡和训练加速。这些发现为死亡神经元作为高效模型压缩和优化的宝贵资源提供了全新视角。