Recently vision transformer models have become prominent models for a range of tasks. These models, however, usually suffer from intensive computational costs and heavy memory requirements, making them impractical for deployment on edge platforms. Recent studies have proposed to prune transformers in an unexplainable manner, which overlook the relationship between internal units of the model and the target class, thereby leading to inferior performance. To alleviate this problem, we propose a novel explainable pruning framework dubbed X-Pruner, which is designed by considering the explainability of the pruning criterion. Specifically, to measure each prunable unit's contribution to predicting each target class, a novel explainability-aware mask is proposed and learned in an end-to-end manner. Then, to preserve the most informative units and learn the layer-wise pruning rate, we adaptively search the layer-wise threshold that differentiates between unpruned and pruned units based on their explainability-aware mask values. To verify and evaluate our method, we apply the X-Pruner on representative transformer models including the DeiT and Swin Transformer. Comprehensive simulation results demonstrate that the proposed X-Pruner outperforms the state-of-the-art black-box methods with significantly reduced computational costs and slight performance degradation.
翻译:近年来,视觉Transformer模型已成为各类任务的主流模型。然而,这些模型通常存在计算成本高昂和内存需求巨大的问题,使其难以部署于边缘平台。现有研究提出了以缺乏可解释性的方式对Transformer进行剪枝的方法,这类方法忽略了模型内部单元与目标类别之间的关联,导致性能表现不佳。为解决这一问题,我们提出了一种名为X-Pruner的新型可解释剪枝框架,该框架通过考虑剪枝准则的可解释性进行设计。具体而言,为衡量每个可剪枝单元对预测特定目标类别的贡献,我们提出了一种新型可解释性感知掩码,并以端到端方式进行学习。随后,为保留最具信息量的单元并学习逐层剪枝率,我们基于可解释性感知掩码值自适应地搜索区分未剪枝与已剪枝单元的逐层阈值。为验证和评估本方法,我们在包括DeiT和Swin Transformer在内的代表性Transformer模型上应用了X-Pruner。综合仿真结果表明,所提出的X-Pruner在显著降低计算成本且性能轻微下降的前提下,优于当前最优的黑盒方法。