Convolutional neural networks (CNNs) are commonplace in high-performing solutions to many real-world problems, such as audio classification. CNNs have many parameters and filters, with some having a larger impact on the performance than others. This means that networks may contain many unnecessary filters, increasing a CNN's computation and memory requirements while providing limited performance benefits. To make CNNs more efficient, we propose a pruning framework that eliminates filters with the highest "commonality". We measure this commonality using the graph-theoretic concept of "centrality". We hypothesise that a filter with a high centrality should be eliminated as it represents commonality and can be replaced by other filters without affecting the performance of a network much. An experimental evaluation of the proposed framework is performed on acoustic scene classification and audio tagging. On the DCASE 2021 Task 1A baseline network, our proposed method reduces computations per inference by 71\% with 50\% fewer parameters at less than a two percentage point drop in accuracy compared to the original network. For large-scale CNNs such as PANNs designed for audio tagging, our method reduces 24\% computations per inference with 41\% fewer parameters at a slight improvement in performance.
翻译:卷积神经网络(CNN)在高性能解决诸多实际问题(如音频分类)中已司空见惯。CNN拥有大量参数和滤波器,其中某些滤波器对性能的影响远大于其他。这意味着网络可能包含许多不必要的滤波器,在增加计算和内存需求的同时,仅带来有限的性能提升。为提升CNN效率,我们提出一种剪枝框架,剔除具有最高“共同性”的滤波器。我们利用图论中的“中心性”概念来衡量这种共同性。我们假设:高中心性的滤波器因其代表共同性而应被剔除,且可被其他滤波器替代,而不会显著影响网络性能。本文在声学场景分类和音频标记任务上对所提框架进行了实验评估。在DCASE 2021任务1A基线网络上,与原始网络相比,我们的方法在精度下降不到两个百分点的条件下,将单次推理计算量减少71%,参数减少50%。对于面向音频标记的大规模CNN(如PANNs),我们的方法在性能略有提升的同时,将单次推理计算量减少24%,参数减少41%。