Pruning is a promising approach to compress complex deep learning models in order to deploy them on resource-constrained edge devices. However, many existing pruning solutions are based on unstructured pruning, which yields models that cannot efficiently run on commodity hardware and require users to manually explore and tune the pruning process, which is time-consuming and often leads to sub-optimal results. To address these limitations, this paper presents an adaptive, activation-based, structured pruning approach to automatically and efficiently generate small, accurate, and hardware-efficient models that meet user requirements. First, it proposes iterative structured pruning using activation-based attention feature maps to effectively identify and prune unimportant filters. Then, it proposes adaptive pruning policies for automatically meeting the pruning objectives of accuracy-critical, memory-constrained, and latency-sensitive tasks. A comprehensive evaluation shows that the proposed method can substantially outperform the state-of-the-art structured pruning works on CIFAR-10 and ImageNet datasets. For example, on ResNet-56 with CIFAR-10, without any accuracy drop, our method achieves the largest parameter reduction (79.11%), outperforming the related works by 22.81% to 66.07%, and the largest FLOPs reduction (70.13%), outperforming the related works by 14.13% to 26.53%.
翻译:剪枝是一种有前景的压缩复杂深度学习模型的方法,旨在将其部署到资源受限的边缘设备上。然而,现有许多剪枝方案基于非结构化剪枝,生成的模型无法在通用硬件上高效运行,且需要用户手动探索和调整剪枝过程,耗时且常导致次优结果。为解决这些局限,本文提出一种自适应的、基于激活的结构化剪枝方法,能够自动且高效地生成小巧、准确且硬件高效的模型,满足用户需求。首先,该方法利用基于激活的注意力特征图进行迭代结构化剪枝,有效识别并剪除不重要的滤波器。其次,提出自适应剪枝策略,自动满足精度关键型、内存受限型和延迟敏感型任务的剪枝目标。全面评估表明,所提方法在CIFAR-10和ImageNet数据集上显著优于当前最先进的结构化剪枝工作。例如,在基于CIFAR-10的ResNet-56上,无精度损失下,我们的方法实现了最大参数量缩减(79.11%),较相关工作提升22.81%至66.07%;同时实现最大FLOPs缩减(70.13%),较相关工作提升14.13%至26.53%。