Pruning has emerged as the primary approach used to limit the resource requirements of large neural networks (NNs). Since the proposal of the lottery ticket hypothesis, researchers have focused either on pruning at initialization or after training. However, recent theoretical findings have shown that the sample efficiency of robust pruned models is proportional to the mutual information (MI) between the pruning masks and the model's training datasets, \textit{whether at initialization or after training}. In this paper, starting from these results, we introduce Mutual Information Preserving Pruning (MIPP), a structured activation-based pruning technique applicable before or after training. The core principle of MIPP is to select nodes in a way that conserves MI shared between the activations of adjacent layers, and consequently between the data and masks. Approaching the pruning problem in this manner means we can prove that there exists a function that can map the pruned upstream layer's activations to the downstream layer's, implying re-trainability. We demonstrate that MIPP consistently outperforms state-of-the-art methods, regardless of whether pruning is performed before or after training.
翻译:剪枝已成为限制大型神经网络资源需求的主要方法。自彩票假设提出以来,研究者们主要关注在初始化时或训练后进行剪枝。然而,最近的理论研究表明,鲁棒剪枝模型的样本效率与剪枝掩码和模型训练数据集之间的互信息成正比,\textit{无论是在初始化时还是训练后}。本文基于这些结果,提出了互信息保持剪枝,这是一种可在训练前或训练后应用的、基于激活的结构化剪枝技术。MIPP的核心原理是以一种能够保留相邻层激活之间共享的互信息(进而保留数据与掩码之间互信息)的方式来选择节点。以此种方式处理剪枝问题意味着我们可以证明存在一个函数,能够将剪枝后的上游层激活映射到下游层激活,这暗示了模型的可重训练性。我们证明,无论在训练前还是训练后进行剪枝,MIPP始终优于最先进的方法。