Pruning of deep neural networks has been an effective technique for reducing model size while preserving most of the performance of dense networks, crucial for deploying models on memory and power-constrained devices. While recent sparse learning methods have shown promising performance up to moderate sparsity levels such as 95% and 98%, accuracy quickly deteriorates when pushing sparsities to extreme levels. Obtaining sparse networks at such extreme sparsity levels presents unique challenges, such as fragile gradient flow and heightened risk of layer collapse. In this work, we explore network performance beyond the commonly studied sparsities, and propose a collection of techniques that enable the continuous learning of networks without accuracy collapse even at extreme sparsities, including 99.90%, 99.95% and 99.99% on ResNet architectures. Our approach combines 1) Dynamic ReLU phasing, where DyReLU initially allows for richer parameter exploration before being gradually replaced by standard ReLU, 2) weight sharing which reuses parameters within a residual layer while maintaining the same number of learnable parameters, and 3) cyclic sparsity, where both sparsity levels and sparsity patterns evolve dynamically throughout training to better encourage parameter exploration. We evaluate our method, which we term Extreme Adaptive Sparse Training (EAST) at extreme sparsities using ResNet-34 and ResNet-50 on CIFAR-10, CIFAR-100, and ImageNet, achieving significant performance improvements over state-of-the-art methods we compared with.
翻译:深度神经网络剪枝是一种在保持稠密网络大部分性能的同时有效减小模型规模的技术,这对于在内存和功耗受限设备上部署模型至关重要。尽管近期稀疏学习方法在中等稀疏度(如95%和98%)下已展现出良好性能,但当稀疏度推向极端水平时,准确率会迅速恶化。在此类极端稀疏度下获得稀疏网络面临独特挑战,例如脆弱的梯度流和加剧的层塌陷风险。本研究探索了超越常规研究范围的网络稀疏性能,提出一系列技术方案,使网络即使在ResNet架构上达到99.90%、99.95%和99.99%等极端稀疏度时,仍能持续学习而不发生准确率塌陷。我们的方法整合了以下技术:1)动态ReLU相位调整——DyReLU在训练初期允许更丰富的参数探索,随后逐步替换为标准ReLU;2)权重共享——在残差层内复用参数同时保持可学习参数数量不变;3)循环稀疏机制——稀疏度水平与稀疏模式在训练过程中动态演化,以更好地促进参数探索。我们将该方法命名为极端自适应稀疏训练(EAST),在CIFAR-10、CIFAR-100和ImageNet数据集上使用ResNet-34和ResNet-50评估其极端稀疏度性能,相较于对比的先进方法取得了显著的性能提升。