Overparameterization constitutes one of the most significant hallmarks of deep neural networks. Though it can offer the advantage of outstanding generalization performance, it meanwhile imposes substantial storage burden, thus necessitating the study of network pruning. A natural and fundamental question is: How sparse can we prune a deep network (with almost no hurt on the performance)? To address this problem, in this work we take a first principles approach, specifically, by merely enforcing the sparsity constraint on the original loss function, we're able to characterize the sharp phase transition point of pruning ratio, which corresponds to the boundary between the feasible and the infeasible, from the perspective of high-dimensional geometry. It turns out that the phase transition point of pruning ratio equals the squared Gaussian width of some convex body resulting from the $l_1$-regularized loss function, normalized by the original dimension of parameters. As a byproduct, we provide a novel network pruning algorithm which is essentially a global one-shot pruning one. Furthermore, we provide efficient countermeasures to address the challenges in computing the involved Gaussian width, including the spectrum estimation of a large-scale Hessian matrix and dealing with the non-definite positiveness of a Hessian matrix. It is demonstrated that the predicted pruning ratio threshold coincides very well with the actual value obtained from the experiments and our proposed pruning algorithm can achieve competitive or even better performance than the existing pruning algorithms. All codes are available at: https://github.com/QiaozheZhang/Global-One-shot-Pruning
翻译:过参数化是深度神经网络最显著的特征之一。虽然过参数化能带来卓越泛化性能的优势,但同时也造成巨大的存储负担,因此网络剪枝研究势在必行。一个自然而基本的问题是:深度网络能剪枝到多稀疏(且几乎不影响性能)?为解决该问题,本文采用第一性原理方法:具体而言,仅通过在原始损失函数上施加稀疏约束,我们就能从高维几何视角刻画剪枝比率的尖锐相变点——该点恰好对应可行域与不可行域的分界线。结果表明,剪枝比率的相变点等于由 $l_1$ 正则化损失函数导出的某凸体的高斯宽度平方,再除以参数的原始维度。作为副产品,我们提出了一种新颖的网络剪枝算法,本质上属于全局一次性剪枝方法。此外,我们提供了高效策略来解决计算高斯宽度所面临的挑战,包括大规模Hessian矩阵的谱估计以及非正定Hessian矩阵的处理。实验证明,预测的剪枝比率阈值与实验实际值高度吻合,且我们提出的剪枝算法能取得与现有剪枝算法相当甚至更优的性能。所有代码已开源至:https://github.com/QiaozheZhang/Global-One-shot-Pruning