Pruning deep neural networks is a widely used strategy to alleviate the computational burden in machine learning. Overwhelming empirical evidence suggests that pruned models retain very high accuracy even with a tiny fraction of parameters. However, relatively little work has gone into characterising the small pruned networks obtained, beyond a measure of their accuracy. In this paper, we use the sparse double descent approach to identify univocally and characterise pruned models associated with classification tasks. We observe empirically that, for a given task, iterative magnitude pruning (IMP) tends to converge to networks of comparable sizes even when starting from full networks with sizes ranging over orders of magnitude. We analyse the best pruned models in a controlled experimental setup and show that their number of parameters reflects task difficulty and that they are much better than full networks at capturing the true conditional probability distribution of the labels. On real data, we similarly observe that pruned models are less prone to overconfident predictions. Our results suggest that pruned models obtained via IMP not only have advantageous computational properties but also provide a better representation of uncertainty in learning.
翻译:剪枝深度神经网络是缓解机器学习计算负担的常用策略。大量实验证据表明,即使在参数仅剩极少数的情况下,剪枝后的模型仍能保持极高的准确性。然而,除准确性指标外,关于此类小型化剪枝网络特征的刻画研究相对匮乏。本文采用稀疏双下降方法唯一识别并刻画分类任务中剪枝模型的特征。实验观察到,在特定任务中,即使初始全尺寸网络的参数规模跨越多个数量级,迭代幅值剪枝(IMP)往往收敛至规模相似的网络。通过可控实验设置分析最优剪枝模型,我们发现其参数数量反映了任务难度,且相较于全尺寸网络,剪枝模型能更精准捕获标签的真实条件概率分布。在真实数据上,我们同样观察到剪枝模型不易产生过度自信预测。研究结果表明,通过IMP获得的剪枝模型不仅具有优越的计算性能,还能更优地表征学习中的不确定性。