Increasingly expensive training of ever larger models such as Vision Transfomers motivate reusing the vast library of already trained state-of-the-art networks. However, their latency, high computational costs and memory demands pose significant challenges for deployment, especially on resource-constrained hardware. While structured pruning methods can reduce these factors, they often require costly retraining, sometimes for up to hundreds of epochs, or even training from scratch to recover the lost accuracy resulting from the structural modifications. Maintaining the provided performance of trained models after structured pruning and thereby avoiding extensive retraining remains a challenge. To solve this, we introduce Variance-Based Pruning, a simple and structured one-shot pruning technique for efficiently compressing networks, with minimal finetuning. Our approach first gathers activation statistics, which are used to select neurons for pruning. Simultaneously the mean activations are integrated back into the model to preserve a high degree of performance. On ImageNet-1k recognition tasks, we demonstrate that directly after pruning DeiT-Base retains over 70% of its original performance and requires only 10 epochs of fine-tuning to regain 99% of the original accuracy while simultaneously reducing MACs by 35% and model size by 36%, thus speeding up the model by 1.44x. The code is available at: https://github.com/boschresearch/variance-based-pruning
翻译:日益昂贵的训练成本推动着对视觉Transformer等更大规模模型的复用,即利用已训练好的大量先进网络。然而,这些网络的延迟、高计算开销及内存需求为部署带来显著挑战,尤其是在资源受限的硬件上。结构化剪枝方法虽能缓解这些问题,但通常需要昂贵的重训练(有时长达数百轮次),甚至从头训练以恢复结构修改导致的精度损失。如何在结构化剪枝后保持预训练模型的原有性能并避免大量重训练仍是一大难题。为此,我们提出基于方差的剪枝方法——一种简单且结构化的单次剪枝技术,通过最小化微调即可高效压缩网络。该方法首先收集激活统计量以选择待剪枝的神经元,同时将均值激活重新整合回模型中,从而保持高水平的性能。在ImageNet-1k识别任务上,我们证明:剪枝后的DeiT-Base直接保留超过70%的原始性能,仅需10轮微调即可恢复99%的原始精度,同时将MACs降低35%、模型尺寸减少36%,从而使模型加速1.44倍。代码开源地址:https://github.com/boschresearch/variance-based-pruning