Filter Pruning based on Information Capacity and Independence

Filter pruning has been widely used in the compression and acceleration of convolutional neural networks (CNNs). However, most existing methods are still challenged by heavy compute cost and biased filter selection. Moreover, most designs for filter evaluation miss interpretability due to the lack of appropriate theoretical guidance. In this paper, we propose a novel filter pruning method which evaluates filters in a interpretable, multi-persepective and data-free manner. We introduce information capacity, a metric that represents the amount of information contained in a filter. Based on the interpretability and validity of information entropy, we propose to use that as a quantitative index of information quantity. Besides, we experimently show that the obvious correlation between the entropy of the feature map and the corresponding filter, so as to propose an interpretable, data-driven scheme to measure the information capacity of the filter. Further, we introduce information independence, another metric that represents the correlation among differrent filters. Consequently, the least impotant filters, which have less information capacity and less information independence, will be pruned. We evaluate our method on two benchmarks using multiple representative CNN architectures, including VGG-16 and ResNet. On CIFAR-10, we reduce 71.9% of floating-point operations (FLOPs) and 69.4% of parameters for ResNet-110 with 0.28% accuracy increase. On ILSVRC-2012, we reduce 76.6% of floating-point operations (FLOPs) and 68.6% of parameters for ResNet-50 with only 2.80% accuracy decrease, which outperforms the state-of-the-arts.

翻译：滤波器剪枝已被广泛用于卷积神经网络（CNNs）的压缩与加速。然而，现有大多数方法仍面临计算成本高和滤波器选择偏差的挑战。此外，由于缺乏恰当的理论指导，多数滤波器评估设计缺乏可解释性。本文提出了一种新颖的滤波器剪枝方法，该方法以可解释、多视角且无数据的方式评估滤波器。我们引入信息容量这一度量，它代表滤波器包含的信息量。基于信息熵的可解释性和有效性，我们提出将其作为信息量的量化指标。此外，我们通过实验证明特征图熵与其对应滤波器之间存在明显相关性，从而提出一种可解释的、数据驱动的方案来度量滤波器的信息容量。进一步，我们引入信息独立性这一度量，它代表不同滤波器之间的相关性。因此，那些信息容量和信息独立性最小的最不重要滤波器将被剪除。我们使用多种代表性CNN架构（包括VGG-16和ResNet）在两个基准数据集上评估了该方法。在CIFAR-10上，我们为ResNet-110减少了71.9%的浮点运算次数（FLOPs）和69.4%的参数，准确率反而提升0.28%。在ILSVRC-2012上，我们为ResNet-50减少了76.6%的浮点运算次数（FLOPs）和68.6%的参数，准确率仅下降2.80%，优于现有最先进方法。