Group channel pruning and spatial attention distilling for object detection

Due to the over-parameterization of neural networks, many model compression methods based on pruning and quantization have emerged. They are remarkable in reducing the size, parameter number, and computational complexity of the model. However, most of the models compressed by such methods need the support of special hardware and software, which increases the deployment cost. Moreover, these methods are mainly used in classification tasks, and rarely directly used in detection tasks. To address these issues, for the object detection network we introduce a three-stage model compression method: dynamic sparse training, group channel pruning, and spatial attention distilling. Firstly, to select out the unimportant channels in the network and maintain a good balance between sparsity and accuracy, we put forward a dynamic sparse training method, which introduces a variable sparse rate, and the sparse rate will change with the training process of the network. Secondly, to reduce the effect of pruning on network accuracy, we propose a novel pruning method called group channel pruning. In particular, we divide the network into multiple groups according to the scales of the feature layer and the similarity of module structure in the network, and then we use different pruning thresholds to prune the channels in each group. Finally, to recover the accuracy of the pruned network, we use an improved knowledge distillation method for the pruned network. Especially, we extract spatial attention information from the feature maps of specific scales in each group as knowledge for distillation. In the experiments, we use YOLOv4 as the object detection network and PASCAL VOC as the training dataset. Our method reduces the parameters of the model by 64.7 % and the calculation by 34.9%.

翻译：由于神经网络存在过度参数化问题，基于剪枝与量化的模型压缩方法应运而生。这些方法在减小模型体积、参数数量和计算复杂度方面效果显著。然而，大多数通过此类方法压缩的模型需要特殊软硬件支持，增加了部署成本。此外，这些方法主要用于分类任务，很少直接应用于检测任务。为解决上述问题，我们针对目标检测网络提出了一种三阶段模型压缩方法：动态稀疏训练、分组通道剪枝和空间注意力蒸馏。首先，为筛选出网络中不重要的通道并在稀疏性与准确性之间保持良好平衡，我们提出了一种动态稀疏训练方法，该方法引入可变稀疏率，且稀疏率会随网络训练过程动态变化。其次，为降低剪枝对网络准确性的影响，我们提出了一种名为分组通道剪枝的新型剪枝方法。具体而言，我们根据特征图层级和网络模块结构的相似性将网络划分为多个分组，随后对每个分组内的通道采用不同的剪枝阈值进行剪枝。最后，为恢复剪枝后网络的准确性，我们采用改进的知识蒸馏方法对剪枝网络进行优化。特别地，我们从每个分组中特定尺度的特征图中提取空间注意力信息作为蒸馏知识。实验中，我们以YOLOv4作为目标检测网络，PASCAL VOC作为训练数据集。本方法使模型参数减少64.7%，计算量降低34.9%。