Convolutional neural networks (CNNs) have achieved significant popularity, but their computational and memory intensity poses challenges for resource-constrained computing systems, particularly with the prerequisite of real-time performance. To release this burden, model compression has become an important research focus. Many approaches like quantization, pruning, early exit, and knowledge distillation have demonstrated the effect of reducing redundancy in neural networks. Upon closer examination, it becomes apparent that each approach capitalizes on its unique features to compress the neural network, and they can also exhibit complementary behavior when combined. To explore the interactions and reap the benefits from the complementary features, we propose the Chain of Compression, which works on the combinational sequence to apply these common techniques to compress the neural network. Validated on the image-based regression and classification networks across different data sets, our proposed Chain of Compression can significantly compress the computation cost by 100-1000 times with ignorable accuracy loss compared with the baseline model.
翻译:卷积神经网络(CNN)已获得显著普及,但其计算和内存密集性对资源受限的计算系统构成挑战,特别是在实时性能的要求下。为减轻这一负担,模型压缩已成为重要的研究焦点。量化、剪枝、提前退出和知识蒸馏等多种方法已证明可以减少神经网络冗余的效果。通过更仔细的研究可以发现,每种方法都利用其独特特征来压缩神经网络,且当组合使用时它们还能表现出互补行为。为探索这些相互作用并利用互补特征的优势,我们提出压缩链(Chain of Compression),该方法基于组合序列应用这些常见技术来压缩神经网络。在基于图像的回归和分类网络以及不同数据集上的验证表明,与基线模型相比,我们提出的压缩链可将计算成本显著压缩100-1000倍,且精度损失可忽略不计。