Structured pruning and quantization are promising approaches for reducing the inference time and memory footprint of neural networks. However, most existing methods require the original training dataset to fine-tune the model. This not only brings heavy resource consumption but also is not possible for applications with sensitive or proprietary data due to privacy and security concerns. Therefore, a few data-free methods are proposed to address this problem, but they perform data-free pruning and quantization separately, which does not explore the complementarity of pruning and quantization. In this paper, we propose a novel framework named Unified Data-Free Compression(UDFC), which performs pruning and quantization simultaneously without any data and fine-tuning process. Specifically, UDFC starts with the assumption that the partial information of a damaged(e.g., pruned or quantized) channel can be preserved by a linear combination of other channels, and then derives the reconstruction form from the assumption to restore the information loss due to compression. Finally, we formulate the reconstruction error between the original network and its compressed network, and theoretically deduce the closed-form solution. We evaluate the UDFC on the large-scale image classification task and obtain significant improvements over various network architectures and compression methods. For example, we achieve a 20.54% accuracy improvement on ImageNet dataset compared to SOTA method with 30% pruning ratio and 6-bit quantization on ResNet-34.
翻译:结构化剪枝与量化是降低神经网络推理时间和内存占用的有效方法。然而,现有大多数方法需要原始训练数据集对模型进行微调,这不仅带来巨大资源消耗,而且对于涉及敏感或专有数据的应用场景,因隐私和安全问题而无法实现。为此,少数无数据方法被提出以解决该问题,但这些方法将无数据剪枝与量化分开执行,未能探索剪枝与量化之间的互补性。本文提出一种名为统一无数据压缩(UDFC)的新框架,无需任何数据和微调过程即可同时执行剪枝与量化。具体而言,UDFC首先假设受损(如剪枝或量化)通道的部分信息可通过其他通道的线性组合得以保留,并基于此假设推导出信息恢复形式,以补偿压缩导致的信息损失。最后,我们构建原始网络与压缩网络之间的重构误差,并从理论上推导出闭式解。我们在大规模图像分类任务上评估UDFC,并在多种网络架构和压缩方法上取得显著改进。例如,在ImageNet数据集上对ResNet-34以30%剪枝率和6比特量化进行实验时,相比当前最优方法,准确率提升20.54%。