Network pruning is an effective measure to alleviate the storage and computational burden of deep neural networks arising from its high overparameterization. Thus raises a fundamental question: How sparse can we prune a deep network without sacrifice on the performance? To address this problem, in this work we'll take a first principles approach, i.e. we directly impose the sparsity constraint on the original loss function and then characterize the necessary and sufficient condition of the sparsity (\textit{which turns out to nearly coincide}) by leveraging the notion of \textit{statistical dimension} in convex geometry. Through this fundamental limit, we're able to identify two key factors that determine the pruning ratio limit, i.e., weight magnitude and network flatness. Generally speaking, the flatter the loss landscape or the smaller the weight magnitude, the smaller pruning ratio. In addition, we provide efficient countermeasures to address the challenges in computing the pruning limit, which involves accurate spectrum estimation of a large-scale and non-positive Hessian matrix. Moreover, through the lens of the pruning ratio threshold, we can provide rigorous interpretations on several heuristics in existing pruning algorithms. Extensive experiments are performed that demonstrate that the our theoretical pruning ratio threshold coincides very well with the experiments. All codes are available at: https://github.com/QiaozheZhang/Global-One-shot-Pruning
翻译:网络剪枝是缓解深度神经网络因高度过参数化带来的存储和计算负担的有效手段。这引出一个基本问题:在不牺牲性能的前提下,我们最多能将深度网络剪枝到多稀疏?为解决此问题,本研究采用第一性原理方法,即直接在原始损失函数上施加稀疏性约束,然后通过利用凸几何中"统计维数"的概念,刻画稀疏性的充分必要条件(结果发现两者几乎一致)。通过这一基本极限,我们识别出决定剪枝比例上限的两个关键因素:权重幅度和网络平坦度。一般而言,损失景观越平坦或权重幅度越小,剪枝比例越小。此外,我们提供了有效的对策来解决计算剪枝极限时面临的挑战,这涉及大规模非正定Hessian矩阵的准确谱估计。更重要的是,通过剪枝比例阈值的视角,我们能为现有剪枝算法中的若干启发式方法提供严格解释。大量实验表明,我们的理论剪枝比例阈值与实验结果高度吻合。所有代码已开源:https://github.com/QiaozheZhang/Global-One-shot-Pruning