Deploying deep neural networks (DNNs) on edge devices requires strong compression with minimal accuracy loss. This paper introduces Mix-and-Match Pruning, a globally guided, layer-wise sparsification framework that leverages sensitivity scores and simple architectural rules to generate diverse, high-quality pruning configurations. The framework addresses a key limitation that different layers and architectures respond differently to pruning, making single-strategy approaches suboptimal. Mix-and-Match derives architecture-aware sparsity ranges, e.g., preserving normalization layers while pruning classifiers more aggressively, and systematically samples these ranges to produce ten strategies per sensitivity signal (magnitude, gradient, or their combination). This eliminates repeated pruning runs while offering deployment-ready accuracy-sparsity trade-offs. Experiments on CNNs and Vision Transformers demonstrate Pareto-optimal results, with Mix-and-Match reducing accuracy degradation on Swin-Tiny by 40% relative to standard single-criterion pruning. These findings show that coordinating existing pruning signals enables more reliable and efficient compressed models than introducing new criteria.
翻译:在边缘设备上部署深度神经网络需要在最小化精度损失的前提下实现强压缩。本文提出混合匹配剪枝(Mix-and-Match Pruning),一种全局引导的逐层稀疏化框架,利用敏感度得分和简单架构规则生成多样且高质量的剪枝配置。该框架解决了不同层和架构对剪枝响应各异的核心局限,使得单一策略方法难以达到最优。混合匹配方法推导出架构感知的稀疏度范围(例如,保留归一化层的同时更激进地剪枝分类器),并系统性地采样这些范围,为每个敏感度信号(幅度、梯度或其组合)生成十种策略。这消除了重复剪枝运行,同时提供了可直接部署的精度-稀疏度权衡。在卷积神经网络和视觉Transformer上的实验证明了帕累托最优结果,混合匹配方法在Swin-Tiny上的精度退化相较标准单标准剪枝降低了40%。这些发现表明,协调现有剪枝信号比引入新标准能够产生更可靠、更高效的压缩模型。