In recent years, the increasing size of deep learning models and their growing demand for computational resources have drawn significant attention to the practice of pruning neural networks, while aiming to preserve their accuracy. In unstructured gradual pruning, which sparsifies a network by gradually removing individual network parameters until a targeted network sparsity is reached, recent works show that both gradient and weight magnitudes should be considered. In this work, we show that such mechanism, e.g., the order of prioritization and selection criteria, is essential. We introduce a gradient-first magnitude-next strategy for choosing the parameters to prune, and show that a fixed-rate subselection criterion between these steps works better, in contrast to the annealing approach in the literature. We validate this on CIFAR-10 dataset, with multiple randomized initializations on both VGG-19 and ResNet-50 network backbones, for pruning targets of 90, 95, and 98% sparsity and for both initially dense and 50% sparse networks. Our proposed fixed-rate gradient-first gradual pruning (FGGP) approach outperforms its state-of-the-art alternatives in most of the above experimental settings, even occasionally surpassing the upperbound of corresponding dense network results, and having the highest ranking across the considered experimental settings.
翻译:近年来,深度学习模型规模的不断增长及其对计算资源日益增长的需求,使得在保持模型精度的同时进行神经网络剪枝的做法受到广泛关注。在非结构化渐进剪枝中——该方法通过逐步移除单个网络参数直至达到目标网络稀疏度——近期研究表明,梯度和权重幅值均应被纳入考量。本工作中,我们证明此类机制(例如优先级排序与选择标准)至关重要。我们提出了一种梯度优先、幅值次之的策略来选择待剪枝参数,并证明在这两个步骤间采用固定速率的子选择准则比文献中的退火方法效果更佳。我们在CIFAR-10数据集上对此进行了验证,在VGG-19和ResNet-50网络骨干上采用多次随机初始化,针对90%、95%和98%的稀疏度目标,以及初始稠密和50%稀疏的网络进行了实验。我们提出的固定速率梯度优先渐进剪枝(FGGP)方法在大多数上述实验设定中优于当前最先进的替代方案,甚至偶尔超越对应稠密网络结果的上界,并在所有实验设定中获得了最高综合排名。