Driven by significant improvements in architectural design and training pipelines, computer vision has recently experienced dramatic progress in terms of accuracy on classic benchmarks such as ImageNet. These highly-accurate models are challenging to deploy, as they appear harder to compress using standard techniques such as pruning. We address this issue by introducing the Correlation Aware Pruner (CAP), a new unstructured pruning framework which significantly pushes the compressibility limits for state-of-the-art architectures. Our method is based on two technical advancements: a new theoretically-justified pruner, which can handle complex weight correlations accurately and efficiently during the pruning process itself, and an efficient finetuning procedure for post-compression recovery. We validate our approach via extensive experiments on several modern vision models such as Vision Transformers (ViT), modern CNNs, and ViT-CNN hybrids, showing for the first time that these can be pruned to high sparsity levels (e.g. $\geq 75$%) with low impact on accuracy ($\leq 1$% relative drop). Our approach is also compatible with structured pruning and quantization, and can lead to practical speedups of 1.5 to 2.4x without accuracy loss. To further showcase CAP's accuracy and scalability, we use it to show for the first time that extremely-accurate large vision models, trained via self-supervised techniques, can also be pruned to moderate sparsities, with negligible accuracy loss.
翻译:受架构设计与训练流程显著改进的驱动,计算机视觉在ImageNet等经典基准测试中的准确性近期取得了惊人进展。然而,这些高精度模型面临部署挑战——它们在使用剪枝等标准压缩技术时显得更难以压缩。我们通过提出相关感知剪枝器(CAP)——一种新的非结构化剪枝框架——来解决该问题,该框架显著突破了现有最先进架构的可压缩性极限。本方法基于两项技术进展:一是在剪枝过程中能精准高效处理复杂权重相关性的新型理论可证剪枝器,二是用于压缩后恢复的高效微调流程。我们在Vision Transformers(ViT)、现代CNN及ViT-CNN混合体等多种当代视觉模型上开展大量实验验证方法有效性,首次证明这些模型可在高稀疏度(如≥75%)下实现剪枝,且对精度影响极小(相对下降≤1%)。本方法同时兼容结构化剪枝与量化,可在不损失精度的情况下实现1.5至2.4倍实际加速。为进一步展示CAP的精度与可扩展性,我们首次证明:通过自监督技术训练的超高精度大型视觉模型,也能在精度损失可忽略的前提下实现中等程度稀疏化剪枝。