Machine learning pipelines for classification tasks often train a universal model to achieve accuracy across a broad range of classes. However, a typical user encounters only a limited selection of classes regularly. This disparity provides an opportunity to enhance computational efficiency by tailoring models to focus on user-specific classes. Existing works rely on unstructured pruning, which introduces randomly distributed non-zero values in the model, making it unsuitable for hardware acceleration. Alternatively, some approaches employ structured pruning, such as channel pruning, but these tend to provide only minimal compression and may lead to reduced model accuracy. In this work, we propose CRISP, a novel pruning framework leveraging a hybrid structured sparsity pattern that combines both fine-grained N:M structured sparsity and coarse-grained block sparsity. Our pruning strategy is guided by a gradient-based class-aware saliency score, allowing us to retain weights crucial for user-specific classes. CRISP achieves high accuracy with minimal memory consumption for popular models like ResNet-50, VGG-16, and MobileNetV2 on ImageNet and CIFAR-100 datasets. Moreover, CRISP delivers up to 14$\times$ reduction in latency and energy consumption compared to existing pruning methods while maintaining comparable accuracy. Our code is available at https://github.com/shivmgg/CRISP/.
翻译:针对分类任务的机器学习流水线通常训练一个通用模型以实现跨广泛类别的准确率。然而,典型用户仅需定期处理有限的类别子集。这一差异提供了通过定制模型聚焦用户特定类别以提升计算效率的机会。现有工作依赖非结构化剪枝,会在模型中引入随机分布的非零值,使其不适用于硬件加速。另一些方法采用结构化剪枝(如通道剪枝),但这些方案往往仅能实现最小程度的压缩,且可能导致模型准确率下降。本文提出CRISP——一种新颖的剪枝框架,利用结合细粒度N:M结构化稀疏与粗粒度块稀疏的混合结构化稀疏模式。我们的剪枝策略由基于梯度的类别感知显著性分数引导,从而保留对用户特定类别至关重要的权重。在ImageNet和CIFAR-100数据集上,CRISP使ResNet-50、VGG-16和MobileNetV2等流行模型在保持高准确率的同时实现极低内存消耗。此外,与现有剪枝方法相比,CRISP在维持相当准确率的情况下,将延迟和能耗降低高达14倍。我们的代码已开源至https://github.com/shivmgg/CRISP/。