Network pruning and quantization are proven to be effective ways for deep model compression. To obtain a highly compact model, most methods first perform network pruning and then conduct network quantization based on the pruned model. However, this strategy may ignore that they would affect each other and thus performing them separately may lead to sub-optimal performance. To address this, performing pruning and quantization jointly is essential. Nevertheless, how to make a trade-off between pruning and quantization is non-trivial. Moreover, existing compression methods often rely on some pre-defined compression configurations. Some attempts have been made to search for optimal configurations, which however may take unbearable optimization cost. To address the above issues, we devise a simple yet effective method named Single-path Bit Sharing (SBS). Specifically, we first consider network pruning as a special case of quantization, which provides a unified view for pruning and quantization. We then introduce a single-path model to encode all candidate compression configurations. In this way, the configuration search problem is transformed into a subset selection problem, which significantly reduces the number of parameters, computational cost and optimization difficulty. Relying on the single-path model, we further introduce learnable binary gates to encode the choice of bitwidth. By jointly training the binary gates in conjunction with network parameters, the compression configurations of each layer can be automatically determined. Extensive experiments on both CIFAR-100 and ImageNet show that SBS is able to significantly reduce computational cost while achieving promising performance. For example, our SBS compressed MobileNetV2 achieves 22.6x Bit-Operation (BOP) reduction with only 0.1% drop in the Top-1 accuracy.
翻译:网络剪枝与量化被证明是深度模型压缩的有效手段。为获得高度紧凑的模型,现有方法通常先进行网络剪枝,再对剪枝后的模型进行量化。然而,这种策略可能忽略二者间的相互影响,导致单独执行时性能次优。为此,联合执行剪枝与量化至关重要。但如何在剪枝与量化之间取得平衡并非易事。此外,现有压缩方法常依赖预设的压缩配置,虽有研究尝试搜索最优配置,却可能带来难以承受的优化成本。针对上述问题,我们提出一种简洁而有效的方法——单路径比特共享(SBS)。具体而言,我们首先将网络剪枝视为量化的特殊情况,为剪枝与量化提供统一视角,继而引入单路径模型编码所有候选压缩配置。这样,配置搜索问题便转化为子集选择问题,显著降低了参数量、计算成本及优化难度。基于单路径模型,我们进一步引入可学习的二进制门控机制来编码比特宽度选择。通过联合训练二进制门控与网络参数,各层压缩配置可自动确定。在CIFAR-100和ImageNet上的大量实验表明,SBS在显著降低计算成本的同时能够取得优异的性能。例如,经SBS压缩后的MobileNetV2在Top-1准确率仅下降0.1%的情况下,实现了22.6倍的比特操作(BOP)缩减。