Recent works show that reducing the number of layers in a convolutional neural network can enhance efficiency while maintaining the performance of the network. Existing depth compression methods remove redundant non-linear activation functions and merge the consecutive convolution layers into a single layer. However, these methods suffer from a critical drawback; the kernel size of the merged layers becomes larger, significantly undermining the latency reduction gained from reducing the depth of the network. We show that this problem can be addressed by jointly pruning convolution layers and activation functions. To this end, we propose LayerMerge, a novel depth compression method that selects which activation layers and convolution layers to remove, to achieve a desired inference speed-up while minimizing performance loss. Since the corresponding selection problem involves an exponential search space, we formulate a novel surrogate optimization problem and efficiently solve it via dynamic programming. Empirical results demonstrate that our method consistently outperforms existing depth compression and layer pruning methods on various network architectures, both on image classification and generation tasks. We release the code at https://github.com/snu-mllab/LayerMerge.
翻译:近期研究表明,减少卷积神经网络的层数可以在保持网络性能的同时提升效率。现有的深度压缩方法通过移除冗余的非线性激活函数,并将连续的卷积层合并为单个层来实现这一目标。然而,这些方法存在一个关键缺陷:合并后层的卷积核尺寸会增大,这显著削弱了因网络深度减少而获得的延迟降低效果。我们证明,该问题可以通过联合剪枝卷积层与激活函数来解决。为此,我们提出LayerMerge——一种新颖的深度压缩方法,该方法通过选择需要移除的激活层与卷积层,在实现预期推理加速的同时最小化性能损失。由于对应的选择问题涉及指数级搜索空间,我们构建了一个新颖的代理优化问题,并通过动态规划算法高效求解。实验结果表明,在多种网络架构上,无论是图像分类还是生成任务,我们的方法均持续优于现有的深度压缩与层剪枝方法。代码已发布于 https://github.com/snu-mllab/LayerMerge。